View previous topic :: View next topic |
Author |
Message |
Phoog Newbie
Joined: 11 Dec 2005 Posts: 7
|
Posted: Tue Apr 18, 2006 3:39 pm Post subject: |
|
|
Hey,
I have run in to a big problem, that you always get to sooner or later...
SPAM!
In what ways are you fighting spam?
Is there any good reading regarding spam on search engines?
And, for example if a page using a calendar with link for example.com/2006/april/01 or example.com?y=2006&m=april&d=01, the pages are just going on for ever. What do you think is the best way to handle those pages? |
|
Back to top |
|
|
runarb Site Admin
Joined: 29 Oct 2006 Posts: 4
|
Posted: Tue Apr 18, 2006 6:32 pm Post subject: |
|
|
One way to deal with the calendar problem may be to don’t crawl pages that have few links to it. If a page only has one link to it, and it is from the same domain, it probably isn’t worth crawling.
Take a look on “Focused Crawling Using Context Graphs†http://citeseer.ist.psu.edu/diligenti00focused.html
Also bee careful crawling dynamic pages where the link did com from the page itself.
For example don’t crawl example.com/calendar.php?y=2007&m=april&d=01 if the link did come from example.com/calendar.php?y=2006&m=april&d=01 , because both is a versions of calendar.php
Only crawl it if it did come from an other page, like example.com/todo.php _________________ CTO @ Searchdaimon company search. |
|
Back to top |
|
|
|