Package | Description |
---|---|
org.egothor.cache |
Provides classes that help implementing a cache for the egothor project.
|
org.egothor.core |
This package concentrates the core data objects and interfaces.
|
org.egothor.core.memory |
This package contains an implementation of barrels in memory.
|
org.egothor.core.query |
This package contains objects that represent a structure of a
query in our inner and binary form, readers of a barrel and a result queue.
|
org.egothor.dir |
This package defines objects of distributed IR layer.
|
org.egothor.duplicity.algorithm |
This package contains top-level classes that implement the duplicity checking algorithm.
|
org.egothor.duplicity.visualization |
This package contains classes implementing the visualization of the duplicities found in a document by the duplicity checking algorithm.
|
org.egothor.html |
This package contains a specific implementation of core objects
for HTML with "home" and "content" support.
|
org.egothor.parser | |
org.egothor.parser.filter |
This package defines objects that filter tokens.
|
org.egothor.parser.plain |
This package defines JavaCC parser for a plain text.
|
org.egothor.query |
This package contains objects that represent a structure of a
query in our inner form.
|
org.egothor.query.parser |
This package defines JavaCC parser of user's query.
|
org.egothor.query.runner |
This package defines a machinery that navigates the rider during
query execution in the Vector model.
|
org.egothor.query.runner.enhanced |
This package defines a machinery that navigates the rider during
query execution in any model.
|
org.egothor.text |
This package contains support of some textual routines/processes.
|
Modifier and Type | Method and Description |
---|---|
Token[] |
CachedQuery.getTokens()
Gets the tokens that are forming the query
|
Constructor and Description |
---|
CachedQuery(Query query,
Token[] tokens,
int offset,
int length,
long max_hits2scan,
double pg_rerank)
Constructor for the CachedQuery object.
|
Modifier and Type | Field and Description |
---|---|
Sequence<Token> |
Filter.prev
The Tokenizer used by Filter.
|
Modifier and Type | Method and Description |
---|---|
Token |
Filter.action(Token t)
Used for changing tokens of the input tokenizer.
|
Token |
Token.newText(String name,
String text)
Clones this objects with a new name and text.
|
Token |
Filter.next()
The next token of input tokenizer is modified by
Filter.action(org.egothor.core.Token) and the product is also the product of
this method. |
Token[] |
QueryResponse.queryTokens()
Return an array of all tokens recognized in a query.
|
Modifier and Type | Method and Description |
---|---|
Sequence<Token> |
Filter.getPrevTokenizer()
Return the tokenizer this filter reads from.
|
Sequence<Token> |
DocumentData.words(boolean readlinx,
boolean readilinx,
boolean lowercase,
boolean phonetics,
HTMLField.Diacritics diacritics,
boolean paragraphs,
boolean paragraphsKeepPunctuation,
String encoding) |
Modifier and Type | Method and Description |
---|---|
Token |
Filter.action(Token t)
Used for changing tokens of the input tokenizer.
|
Modifier and Type | Method and Description |
---|---|
void |
Filter.setPrevTokenizer(Sequence<Token> prev)
Set the tokenizer this filter reads from.
|
Constructor and Description |
---|
QueryResponse(int offset,
long wouldBe,
long positives,
Sequence<Hit> e,
Token[] queryTokens,
int enum_len,
Query adaptedQuery)
Constructor for the QueryResponse object.
|
Constructor and Description |
---|
Filter(Sequence<Token> prev)
Constructor for the Filter object.
|
Modifier and Type | Method and Description |
---|---|
Sequence<Token> |
FTField.filteredWords()
Return an enumeration of the terms in the field filtered by filters.
|
Sequence<Token> |
FTField.words()
Return an enumeration of the terms in the field.
|
Modifier and Type | Method and Description |
---|---|
void |
Query.addTerms(HashSet<Token> to)
Adds all terms in this query into the given HashSet.
|
Modifier and Type | Method and Description |
---|---|
abstract DocumentData |
Group.expandDocMetadata(DocumentData ofBarrel,
Token[] interest)
Retrives the document data, but restrict the data block to the part relevant to
some tokens of our interest.
|
DocumentData |
TankerImpl.expandDocMetadata(DocumentData ofBarrel,
Token[] interest) |
abstract DocumentData |
Tanker.expandDocMetadata(DocumentData ofBarrel,
Token[] interest) |
DocumentData |
TankerImplSecure.expandDocMetadata(DocumentData ofBarrel,
Token[] interest)
Deprecated.
|
DocumentData |
TankerImplSecure.expandDocMetadataSecure(DocumentData ofBarrel,
Token[] interest)
Standard exapnding of doc metadata, but using multithreaded safe way.
|
Modifier and Type | Method and Description |
---|---|
CWI |
Group.getCWI(HashSet<Token> terms)
Return the subset of CWI for the given subset of terms.
|
CWI |
TankerImplSecure.getCWI(HashSet<Token> terms) |
Modifier and Type | Method and Description |
---|---|
void |
PermutatedMinsFiller.computeDocumentMins(DocumentPermutatedMins result,
Sequence<Token> terms,
long documentUID,
int documentDBRevision)
Computes the permutated mins values for given sequence of tokens of a document
and fills it into the result under the identificator documentID.
|
Modifier and Type | Method and Description |
---|---|
static List<List<Token>> |
DocumentDuplicities.getDocumentUnits(Sequence<Token> words)
Takes the sequence of document words and depending on the
Constants.CHECK_DUPLICITY_LEVEL splits it
to the appropriate text units - documents, paragraphs or sentences. |
Modifier and Type | Method and Description |
---|---|
static List<List<Token>> |
DocumentDuplicities.getDocumentUnits(Sequence<Token> words)
Takes the sequence of document words and depending on the
Constants.CHECK_DUPLICITY_LEVEL splits it
to the appropriate text units - documents, paragraphs or sentences. |
Modifier and Type | Method and Description |
---|---|
Sequence<Token> |
HTMLField.words() |
Modifier and Type | Method and Description |
---|---|
void |
HTMLField.setAppendix(Sequence<Token> appendix) |
Modifier and Type | Method and Description |
---|---|
Token |
Strings2Tokens.next() |
Modifier and Type | Method and Description |
---|---|
Token |
LowerCase.action(Token t)
If the name/type of the token is not
<EMAIL/PUNCT/NUM> then
transform the text of the token to lower case. |
Token |
Stemmer.action(Token t)
A simple stemming algorithm which works as follows:
|
Token |
DupWithoutDiacritics.action(Token t)
If the name/type of the token is
<WORD> then
transform the text of the token to lower case. |
Token |
WordNGrammer.next()
Return the next token.
|
Token |
Grammer.next()
Return the next token.
|
Token |
ParagraphPunctFilter.next()
Return the next token.
|
Token |
ParagraphFilter.next()
Return the next token.
|
Token |
DupWithoutDiacritics.next()
Return the next token.
|
Token |
RemoveDiacritics.next()
If the name/type of the token is
<WORD> then
transform the text of the token to lower case. |
Token |
Phonetics.next()
Return the next token.
|
Token |
PunctFilter.next()
Return the next token.
|
Token |
StopFilter.next()
Return the next token.
|
Modifier and Type | Method and Description |
---|---|
Token |
LowerCase.action(Token t)
If the name/type of the token is not
<EMAIL/PUNCT/NUM> then
transform the text of the token to lower case. |
Token |
Stemmer.action(Token t)
A simple stemming algorithm which works as follows:
|
Token |
DupWithoutDiacritics.action(Token t)
If the name/type of the token is
<WORD> then
transform the text of the token to lower case. |
static boolean |
ParagraphPunctFilter.isParagraphDelimiter(Token t)
Test whether a token is a paragraph delimiter.
|
static boolean |
ParagraphPunctFilter.isPunctuation(Token t)
Test whether a token is a punctuation (mark).
|
boolean |
PunctFilter.isPunctuation(Token t)
Test whether a token is a punctuation (mark) or it can be ignored.
|
abstract boolean |
StopFilter.isStoppedToken(Token t)
Test whether a token should be processed or ignored.
|
Constructor and Description |
---|
DupWithoutDiacritics(Sequence<Token> prev)
Constructor for the Diacritics object.
|
Grammer(Sequence<Token> arg0)
Constructor for the Grammer object.
|
LowerCase(Sequence<Token> prev)
Constructor for the LowerCase object.
|
LowerCase(Sequence<Token> prev,
Locale locale)
Construct a LowerCase object using the given localization setting.
|
ParagraphFilter(Sequence<Token> prev)
Constructor for the ParagraphPunctFilter object
|
ParagraphPunctFilter(Sequence<Token> prev)
Constructor for the ParagraphPunctFilter object
|
Phonetics(Sequence<Token> arg0)
Constructor for the Phonetics object.
|
PunctFilter(Sequence<Token> arg0)
Constructor for the PunctFilter object
|
RemoveDiacritics(Sequence<Token> prev)
Constructor for the Diacritics object.
|
Stemmer(Sequence<Token> prev,
Trie stemmer)
Construct a Stem object using the given stemmer table.
|
StopFilter(Sequence<Token> arg0)
Constructor for the StopFilter object.
|
WordNGrammer(Sequence<Token> prev)
Constructor for the WordNGrammer object
|
Modifier and Type | Method and Description |
---|---|
Token |
Plain.next()
Return the next Token.
|
Token |
Simple.next()
Returns the next token in the stream, or null at EOS.
|
Modifier and Type | Method and Description |
---|---|
static int |
Configuration.defaultBoost(Token tok)
Description of the Method
|
static boolean |
Configuration.isControlToken(Token tok)
Is this a control token which is not excluded when it has a low
idf?
|
Modifier and Type | Method and Description |
---|---|
void |
QProx.addTerms(HashSet<Token> to) |
void |
QGroup.addTerms(HashSet<Token> to) |
void |
QTerm.addTerms(HashSet<Token> to) |
void |
QNot.addTerms(HashSet<Token> to) |
void |
QAnd.addTerms(HashSet<Token> to) |
void |
QPhrase.addTerms(HashSet<Token> to) |
void |
QOr.addTerms(HashSet<Token> to) |
Constructor and Description |
---|
QTerm(Token token,
String field)
Constructor for the QTerm object.
|
QTerm(Token token,
String field,
double idf,
boolean req,
boolean proh,
int boost,
int lowerBound,
int upperBound)
Constructor for the QTerm object.
|
Modifier and Type | Method and Description |
---|---|
Token |
Parser.next()
Returns the next token in the stream, or null at EOS.
|
Constructor and Description |
---|
TermRunner(double idf,
Rider r,
String field,
Token token,
int boost,
boolean req,
boolean proh)
Constructor for the TermRunner object
|
TermRunner(IListWasher washer,
double idf,
Rider r,
String field,
Token token,
int boost,
boolean req,
boolean proh)
Constructor for the TermRunner object.
|
Constructor and Description |
---|
TermRunner(int model,
double idf,
Rider r,
String field,
Token token,
int boost,
boolean req,
boolean proh)
Constructor for the TermRunner object
|
TermRunner(int model,
IListWasher washer,
double idf,
Rider r,
String field,
Token token,
int boost,
boolean req,
boolean proh)
Constructor for the TermRunner object.
|
Modifier and Type | Method and Description |
---|---|
Token |
Generator.next() |
Modifier and Type | Method and Description |
---|---|
boolean |
SnipperOfTokens.matches(Token w)
Description of the Method
|
boolean |
SnipperOfStrings.matches(Token w)
Description of the Method
|
abstract boolean |
Snipper.matches(Token w)
Description of the Method
|
boolean |
SnipperOfStringsDiacritics.matches(Token w)
Description of the Method
|
Modifier and Type | Method and Description |
---|---|
String |
Snipper.filter(Sequence<Token> tokens)
Description of the Method
|
Constructor and Description |
---|
SnipperOfTokens(Token[] word,
boolean htmlAware,
String startHit,
String endHit)
Constructor for the Snipper object
|
Copyright © 2016 Egothor. All Rights Reserved.