sirdf.com Forum Index sirdf.com
Search & Information Retrieval Development Forum
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Programming Languages

 
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine
View previous topic :: View next topic  
Author Message
Fischerlaender
Member


Joined: 08 May 2004
Posts: 11
Location: Osterhofen, Bavaria, Germany

PostPosted: Tue Jul 13, 2004 10:59 am    Post subject: Reply with quote

As far as I can see are the big search engines using C++ as their programming language. I remember that Lycos (back then when they were a real search engine) used Perl until they reached a size of about 100mio pages.

From my own engine I know, that Perl is a great language for the crawler and can be used for building the index, but it is definitely too slow for the front end (aka query engine). After doing a lot of experiments with Perl I switched to C for this peticular task.

Now I often hear from people, that they are developing a search engine in Java and I can't understand that Java is fast enough for ther front end.

Any experiences and recommendations for the ideal languages for the several parts of a search engine?
_________________
<a href='http://www.neomo.de' target='_blank'>http://www.neomo.de</a> - die Suchmaschinen-Alternative (Testversion)
Back to top
View user's profile Send private message Visit poster's website
runarb
Site Admin


Joined: 29 Oct 2006
Posts: 4

PostPosted: Wed Jul 14, 2004 9:20 pm    Post subject: Reply with quote

The problem with systems like Java end Perl is that they run in a virtual machine, sow you do not have direct access to the memory, but has to go through the VM. This makes usages on arrays slow, and thus sorting slow.

The Great Computer Language Shootout has an Array Access test there the same program takes 0.11 sec for C, 0.82 sec for Java and 15.27 for Perl. http://www.bagley.org/~doug/shootout/bench/ary3/


Quote:
are developing a search engine in Java and I can't understand that Java is fast enough for ther front end.


Java is slow, but it think it is easier to find inexpensive Java programmers than C or C++ programmers because sow many universities only teaches Java. Most search engine programmers are still in university, and can thereof recruit there friends, and class mates as there first staff if they uses Java.

They can then eventuality make a new runtime system for serving the result when they has the rest working.
_________________
CTO @ Searchdaimon company search.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Fischerlaender
Member


Joined: 08 May 2004
Posts: 11
Location: Osterhofen, Bavaria, Germany

PostPosted: Wed Jul 14, 2004 10:42 pm    Post subject: Reply with quote

Quote:
This makes usages on arrays slow, and thus sorting slow.

Yes, this is indeed true. Therefor I'm using MySql and the great Unix sort command for such tasks.

Quote:
The Great Computer Language Shootout has an Array Access test there the same program takes 0.11 sec for C, 0.82 sec for Java and 15.27 for Perl.

These kind of tests always remind me of one great article I read about 1988. It was a performance comparison between Assembler, C and Pascal on the Atari ST. The first version of each program yielded the results everybody would have expected: C was about three times faster than Pascal. But then they looked at the programs and found that the C version was speed-optimized in the first run, but the Pascal version looked as if they'd written it for a book teaching the fine art of programming - in other words: there was a lot of optimization possible for the Pascal program. As they compared the optimized versions of Pascal and C, they found that Pascal was even faster than C! (Assembler was of course the fastest "language".)
This short story told me never to believe any speed comparisons of programming languages again. :)

Quote:
They can then eventuality make a new runtime system for serving the result when they has the rest working.

This is indeed the a good strategy. Perl serves to me as a nearly perfect RAD (Rapid Application Development) tool. But I'm sure for people used to Java, Java is the RAD tool of choice. And perhaps it's easier to migrate from Java to C++ then from Perl ...
_________________
<a href='http://www.neomo.de' target='_blank'>http://www.neomo.de</a> - die Suchmaschinen-Alternative (Testversion)
Back to top
View user's profile Send private message Visit poster's website
zootreeves
Newbie


Joined: 10 Dec 2005
Posts: 8

PostPosted: Sat Dec 10, 2005 4:33 pm    Post subject: Reply with quote

With mysql you'lll be limited to about 5 million pages, before things start to slow down to a crawl.
Back to top
View user's profile Send private message
Phoog
Newbie


Joined: 11 Dec 2005
Posts: 7

PostPosted: Thu Jan 19, 2006 3:35 am    Post subject: Reply with quote

QUOTE (zootreeves @ Dec 10 2005, 05:33 PM)
With mysql you'lll be limited to about 5 million pages, before things start to slow down to a crawl.

What do you base that fact on?

I have a friend that have over 8 million posts in his MySQL db at the moment, nothing going slow there.

And, doesent it depends on how you do the crawl, the indexing and the searching, far as I have found there is no know limitation of mysql, or?
Back to top
View user's profile Send private message
INVOISK
Newbie


Joined: 05 Jun 2009
Posts: 2
Location: USA

PostPosted: Tue Jun 09, 2009 7:20 pm    Post subject: Programming Languages Reply with quote

C as said by its creator is an elegant programming language. I agree with that. The syntax of C is elegant too. But people make it ugly. Writing hard to read programs is often easy in C and C. I believe in writing readable programs.

The language that I like most is without doubt C and C.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic    sirdf.com Forum Index -> Making a search engine All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group