Class | Description |
---|---|
DoubleMetaphone |
This module implements a "sounds like" algorithm developed by Lawrence
Philips which he published in the June, 2000 issue of C/C++ Users
Journal.
|
DupWithoutDiacritics |
This filter transforms all (Latin) words to non-diacritical (ASCII), but still keeps the original tokens.
|
Grammer |
This class is really grammer - it produces N-grams.
|
LowerCase |
This filter transforms all words to lower case.
|
Nysiis |
This module implements the New York State Identification and
Intelligence System (NYSIIS) Phonetic Code.
|
ParagraphFilter |
Filter sets the sentence, paragraph and sentenceInParagraph fields
in the Token class, just like the
ParagraphPunctFilter . |
ParagraphPunctFilter |
Filter sets the sentence, paragraph and sentenceInParagraph fields
in the Token class.
|
Phonetics | |
PunctFilter | |
RemoveDiacritics |
This filter transforms all (Latin) words to non-diacritical (ASCII).
|
Stemmer |
The Stemmer object is a filter which transforms all words to their
respective stems.
|
StopFilter |
This abstract class should be extended by any class wishing to ignore
certain tokens while processing all tokens.
|
WordNGrammer |
This class produces N-grams of words.
|
This package defines objects that filter tokens. They are used when you want to transform tokens to their stems or - for example - to lower case characters.
Copyright © 2016 Egothor. All Rights Reserved.