[Egothor-tech] indexing DOC , XLS ,XML files with GUI

Andre pasquinigalde at virgilio.it
Mon Oct 11 10:15:32 BST 2004


Hello,
I have problem with indexing doc ,XLS  and XML file  this is a part of output:

Input charset set to iso-8859-1
  GuiIndexerLocal
    unknown format
  build.xml
Input
  java.lang.String
  Flags: <FILENAME><XML>
Output
  org.egothor.data.Document
  Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
    unknown format
  dmca.pdf
Input
  java.lang.String
  Flags: <FILENAME><PDF>
Output
  org.egothor.data.Document
  Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Filtering system found:
 --$0--> via org.egothor.crusher.IniPath:java.lang.String<PDF><FILENAME>
--$1--> via org.egothor.crusher.connectors.InputStreamPath:java.io.InputStream<BUFFERED><PDF><FILENAME>
--$16--> via org.egothor.crusher.connectors.PDFPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS>
--$18--> via org.egothor.crusher.connectors.CSASCIIPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS><CSASCII>
--$28--> via org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS><CSASCII>
--$30--> via org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE>
--$35--> via org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE>
--$50--> via org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><PDF><PUNCTUATION><HOME><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><CSASCII><LOWERCASE>
log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
log4j:WARN Please initialize the log4j system properly.
  mesi.xls
Input
  java.lang.String
  Flags: <FILENAME><XLS>
Output
  org.egothor.data.Document
  Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
    unknown format
  scheda1.doc
Input
  java.lang.String
  Flags: <FILENAME><DOC>
Output
  org.egothor.data.Document
  Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII>
Mission impossible :-)
    unknown format
  test.html
Input
  java.lang.String
  Flags: <FILENAME><HTML>
Output
  org.egothor.data.Document
  Flags: <HOME><PUNCTUATION><LOWERCASE><SNIPPET><CSASCII><STREAMDECODE>
Filtering system found:
 --$0--> via org.egothor.crusher.IniPath:java.lang.String<HTML><FILENAME>
--$2--> via org.egothor.crusher.connectors.EncReaderPath:java.io.Reader<BUFFERED><HTML><FILENAME><STREAMDECODE>
--$4--> via org.egothor.crusher.connectors.CSASCIIPath:java.io.Reader<BUFFERED><HTML><FILENAME><CSASCII><STREAMDECODE>
--$9--> via org.egothor.crusher.connectors.HTML3Path:java.io.Reader<BUFFERED><HTML><SEMANTICS><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE>
--$19--> via org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE>
--$21--> via
org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><LOWERCASE><STREAMDECODE>
--$26--> via org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PUNCTUATION><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><CSASCII><STREAMDECODE><LOWERCASE>
--$41--> via org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><HTML><PUNCTUATION><HOME><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><CSASCII><LOWERCASE><STREAMDECODE>
Commit...
...optimize()
...commit()
Oct 11, 2004 10:08:58 AM org.egothor.dir.TankerImpl commit
INFO: Saving state
Done

Thanks in advance 

Andrea 





More information about the Egothor-tech mailing list