Joined: 29 Oct 2006
|Posted: Sat Sep 09, 2006 2:34 am Post subject:
| The Boitho search engine now uses 7 independed servers. All have 4 sata disk, making it a total of 28 disk. The problem is that we are having a lot of disk crashes. Have lost some 6 disks total now.
Every time that happened we have a system that can find out which pages was on that disk, and recrawle them.
The recrawl is time consuming, sow we are thinking about switching to raid5.
I have newer really tested out raid in a high performance system. According to http://www.pcguide.com/ref/hdd/perf/raid/l...leLevel5-c.html â€œthe overhead necessary in dealing with the parity continues to bog down writesâ€.
How bad is this bog down?
Disk i/o is a big bottleneck to day. To work around this by uses 4 prepossesses in parallel, each indexing data on one disk. Thereby using all 4 disk at ones. If we changes to raid 5 this method will not work.
Have anyone seen any research on this ?
When we become bigger we will use a â€œredundant array of inexpensive nodesâ€. Where all data resist on at least 3 independed servers. If one fail we can just add another and copy the data from the two remaining servers. Google uses this approach in the â€œGoogle file systemâ€.
CTO @ Searchdaimon company search.