sirdf.com Forum Index sirdf.com
Search & Information Retrieval Development Forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Spam

 
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine
View previous topic :: View next topic  
Author Message
Phoog
Newbie


Joined: 11 Dec 2005
Posts: 7

PostPosted: Tue Apr 18, 2006 3:39 pm    Post subject: Reply with quote

Hey,

I have run in to a big problem, that you always get to sooner or later...
SPAM!

In what ways are you fighting spam?
Is there any good reading regarding spam on search engines?

And, for example if a page using a calendar with link for example.com/2006/april/01 or example.com?y=2006&m=april&d=01, the pages are just going on for ever. What do you think is the best way to handle those pages?
Back to top
View user's profile Send private message
runarb
Site Admin


Joined: 29 Oct 2006
Posts: 4

PostPosted: Tue Apr 18, 2006 6:32 pm    Post subject: Reply with quote

One way to deal with the calendar problem may be to don’t crawl pages that have few links to it. If a page only has one link to it, and it is from the same domain, it probably isn’t worth crawling.


Take a look on “Focused Crawling Using Context Graphs” http://citeseer.ist.psu.edu/diligenti00focused.html


Also bee careful crawling dynamic pages where the link did com from the page itself.

For example don’t crawl example.com/calendar.php?y=2007&m=april&d=01 if the link did come from example.com/calendar.php?y=2006&m=april&d=01 , because both is a versions of calendar.php

Only crawl it if it did come from an other page, like example.com/todo.php
_________________
CTO @ Searchdaimon company search.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group