[Egothor-tech] How to test the Stem class with major Eurpopean
languages
Daniel
lingster at lingster.com
Thu Aug 5 23:50:20 BST 2004
Hello Egothor enthusiasts :)
I've come across this project via Google search for "Java based stemmer
software" and am planning to use the Egothor package for Stemming and text
search of a foreign language content for an educational non profit Web
site.
I now a few things about Java but not much about the Egothor package yet.
My question to everybody is:
1. I have installed the Egothor package , unpacked Stemmer-data.tar.gz
package in to /etc/stemmer folder and ran the ant comp_stem to generate
compiled Stemmer tables in the /tmp/dist/stemmer
2. I'd like to run a quick test of the org.egothor.parser.misc.Stem class
to produce "normalized" versions of real-life sentences in major European
languages - based on these EXISTING stemming tables, what classes and in
what sequence should I INSTANTIATE and use classes described in the
org.egothor.stemmer package - Trie, MultiTrie, Test etc to finally call :
public org.egothor.parser.Token action(org.egothor.parser.Token t)
wher I guess t will point to each of sentence's words ?
I'm not asking to send me a whole code thing but if someone who worked
with Egothor stemmer class - org.egothor.parser.misc.Stem - can describe
a sequence of objects' inialization and passing arguments so a test of
Stemmer using REAL foreign words can be done?
Also I will be very interested to find out how to "teach" the Stemmer
some "exception" rules - like not to stem verbs, adjectives etc etc in
each particular language?
Many thanks in advance,
Daniel Zilberman
=================================================
Daniel Zilberman,
LINGSTER project
www.lingster.com
More information about the Egothor-tech
mailing list