[Egothor-tech] Questions about Egothor
Leo Galambos
Leo.Galambos at egothor.org
Thu Mar 10 17:07:35 GMT 2005
kenta at UDel.Edu wrote:
>Dear Sir or Madam:
>
>I am currently working on a file management system upgrade,
>and have narrowed the choices for our indexing/searching
>solution down to Lucene and Egothor. Before I commit to
>using one or the other, I have a few questions that I
>couldn't resolve through your website. I would greatly
>appreciate if you could give a few quick answers to the
>following questions:
>
>
>
Hello!
>Does Egothor use incremental indexing?
>
>
>
yes
>What are Egothor's RAM/HD requirements per GB of data? Will
>it significantly slow down a system with 30GB-60GB of MS
>Office documents, during indexing or during regular use?
>
>
>
RAM/HD requirements depend on the number of terms in documents. The size
is often (1/10 * text_size + metadata size) and it implies about 4-8GB
in your case.
>Is there a limitation to the amount of data that Egothor can
>handle?
>
>
>
Your disk capacity is the limit, because Egothor runs on 64bit kernel
(it gives you 2^64 docs in a single index etc.)
>Can the index file automatically update when a file is moved?
>
>
>
No, this must be covered by an application logic. For instance,
Egothor's robot/indexer (Capek+Michelangelo) can do this.
>Can the index file be stored on a seperate, smaller machine?
>
>
>
yes
>Can Egothor search metadata? If so, can pictures be searched
>using metadata?
>
>
>
there are two ways:
1) index stores metadata - you must save metadata in a special way and
then you can search for them (you know, fulltext engine is not RDBMS)
2) index stores pointers to metadata (you save metadata elsewhere) -
then you run a query on the external metadata module and Egothor will
get a bitmap selecting a subset of documents which are processed by
fulltext routines
The second way was not tested together with egothor 1.2.x.
>If a new Office version gets released, will filter files
>need to be updated?
>
>
>
It is the question to the authors of the filters. Egothor does not
develop any "real" filters and it only uses 3rd party parsers.
>Finally, have you stumbled across any side-by-side
>comparisons of Egothor with its competitors such as Lucene?
>
>
According to some benchmark I published last year, Egothor 1.2 was as fast as Lucene (+/- 10-20%). The major difference is that Lucene uses a classic algorithm for index actualization while Egothor uses something more sophisticated. That is why Egothor may operate on huge document collections more effectively. Also, Egothor 1.2 uses better algorithms for compression, but the positive effect of this is inconclusive. Lucene optimized memory management (less Java objects allocated by JVM), Egothor does this in the upcoming release (1.3) -- this difference could be visible in stress conditions which are not regular for many applications.
Cheers,
Leo
--
Leo Galambos
Faculty of Mathematics and Physics, DSE
Malostranske namesti 25
Prague 1
CZE
http://kocour.ms.mff.cuni.cz/~galambos/
More information about the Egothor-tech
mailing list