Full-text Search Engine and Library which are entirely written in JAVA
:: egothor

Search this Archive ::
:: Egothor@Home :: Demo (Dundee) :: Download :: Getting started :: Bugs :: API

[Egothor-tech] How to test the Stem class with major Eurpopean languages

Daniel lingster at lingster.com
Thu Aug 5 23:50:20 BST 2004

Hello Egothor enthusiasts :)

I've come across this project via Google search for "Java based stemmer
software" and am planning to use the Egothor package for Stemming and text
search of a foreign language content for an educational non profit Web
site.
 I now a few things about Java but not much about the Egothor package yet.
My question to everybody is:
 1. I have installed the Egothor package , unpacked Stemmer-data.tar.gz
package in to /etc/stemmer folder and ran the ant comp_stem to generate
compiled Stemmer tables in the /tmp/dist/stemmer
 2. I'd like to run a quick test of the org.egothor.parser.misc.Stem class
to produce "normalized" versions of real-life sentences in major European
languages - based on these EXISTING stemming tables, what classes and in
what sequence should I INSTANTIATE and use classes described in the
org.egothor.stemmer package - Trie, MultiTrie, Test etc to finally call :

  public org.egothor.parser.Token action(org.egothor.parser.Token t)

 wher I guess t will point to each of sentence's words ?

 I'm not asking to send me a whole code thing but if someone who worked
with Egothor stemmer class - org.egothor.parser.misc.Stem - can describe
a sequence of objects' inialization and passing arguments so a test of
Stemmer using REAL foreign words can be done?
  Also I will be very interested to find out how to "teach" the Stemmer
some "exception" rules - like not to stem verbs, adjectives etc etc in
each particular language?

 Many thanks in advance,
 Daniel Zilberman

=================================================
Daniel Zilberman,
LINGSTER project
www.lingster.com


More information about the Egothor-tech mailing list
© 2004 Egothor Developers