Environment
The hit list is produced from 2-term query - it shows I/O performance together with speed of evaluation formula. The total length of the list is about 10% of all documents which are saved in the index. Nothing has been printed to console or logs.
O/S, HW: Sun JDK 1.4.2_04, Gentoo Linux 2.4.22-r5, reiserfs 3.6.11, single CPU (P4), UDMA 100
Tested software: egothor 1.2.5-RC (64-bit engine), Lucene 1.3 (32-bit kernel)
The Lucene in this test did not call optimize routine during the indexing. The routine was called when all documents were already processed. See
LuceneIssue.
Document collection
About 17,976 documents (432,411 kB) in HTML, XML, PDF (APIs, guides): css2, docbookXSL, j2sdk-1.3.1, jsguide13, tagref, xerces2, cyberneko, htmlpres, j2sdk-1.4.1, jsguide15, tdg (docbook guide), xml, cs-en dictionary, httpclient-2.0-beta1, java-tutorial, jsref13, uml.
Results
Batch of 1000 queries.
| egothor 1.2.5 RC, NORMFACTOR = 1000 |
| Config | Searching (millisec per query) | Indexing |
| indexSeq | simpleBits | sys | user | real | sys | user | real | index size (kB) |
| true | true | | | | | | | 32,578 |
| egothor 1.2.5 RC, NORMFACTOR = 10000 |
| Config | Searching (millisec per query) | Indexing |
| indexSeq | simpleBits | sys | user | real | sys | user | real | index size (kB) |
| true | true | 0.07 | 3.02 | 3.13 | 0'11.1" | 6'23.3" | 7'16.7" | 40,857 |
| true | false | 0.06 | 3.59 | 3.90 | 0'11.8" | 6'25.8" | 7'24.1" | 37,237 |
| false | true | 0.08 | 3.02 | 3.13 | 1'2.8" | 7'0.8" | 8'44.9" | 41,337 |
| false | false | 0.08 | 3.85 | 3.95 | 1'3.5" | 7'5.3" | 8'53.4" | 37,717 |
| lucene 1.3 |
| N/A | 0.10 | 3.70 | 3.79 | 0'23.6" | 9'39.9" | 11'4.1" | 34,893 |
Egothor indexed all XML, PDF and HTML documents in the collection. Lucene indexed only HTML. The difference is less than 100 documents. Therefore it cannot affect the results.
Tests prepared by
LeoGalambos. Comments and questions are welcome.