[Egothor-tech] indexing DOC , XLS ,XML files with GUI
Andre
pasquinigalde at virgilio.it
Mon Oct 11 10:15:32 BST 2004
Hello,
I have problem with indexing doc ,XLS and XML file this is a part of output:
Input charset set to iso-8859-1
GuiIndexerLocal
unknown format
build.xml
Input
java.lang.String
Flags: <FILENAME><XML>
Output
org.egothor.data.Document
Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
unknown format
dmca.pdf
Input
java.lang.String
Flags: <FILENAME><PDF>
Output
org.egothor.data.Document
Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Filtering system found:
--$0--> via org.egothor.crusher.IniPath:java.lang.String<PDF><FILENAME>
--$1--> via org.egothor.crusher.connectors.InputStreamPath:java.io.InputStream<BUFFERED><PDF><FILENAME>
--$16--> via org.egothor.crusher.connectors.PDFPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS>
--$18--> via org.egothor.crusher.connectors.CSASCIIPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS><CSASCII>
--$28--> via org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS><CSASCII>
--$30--> via org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE>
--$35--> via org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE>
--$50--> via org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><PDF><PUNCTUATION><HOME><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><CSASCII><LOWERCASE>
log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
log4j:WARN Please initialize the log4j system properly.
mesi.xls
Input
java.lang.String
Flags: <FILENAME><XLS>
Output
org.egothor.data.Document
Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
unknown format
scheda1.doc
Input
java.lang.String
Flags: <FILENAME><DOC>
Output
org.egothor.data.Document
Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
unknown format
test.html
Input
java.lang.String
Flags: <FILENAME><HTML>
Output
org.egothor.data.Document
Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII><STREAMDECODE>
Filtering system found:
--$0--> via org.egothor.crusher.IniPath:java.lang.String<HTML><FILENAME>
--$2--> via org.egothor.crusher.connectors.EncReaderPath:java.io.Reader<BUFFERED><HTML><FILENAME><STREAMDECODE>
--$4--> via org.egothor.crusher.connectors.CSASCIIPath:java.io.Reader<BUFFERED><HTML><FILENAME><CSASCII><STREAMDECODE>
--$9--> via org.egothor.crusher.connectors.HTML3Path:java.io.Reader<BUFFERED><HTML><SEMANTICS><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE>
--$19--> via org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE>
--$21--> via
org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE><STREAMDECODE>
--$26--> via org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PUNCTUATION><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE><LOWERCASE>
--$41--> via org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><HTML><PUNCTUATION><HOME><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><CSASCII><LOWERCASE><STREAMDECODE>
Commit...
...optimize()
...commit()
Oct 11, 2004 10:08:58 AM org.egothor.dir.TankerImpl commit
INFO: Saving state
Done
Thanks in advance
Andrea
More information about the Egothor-tech
mailing list