<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Message</TITLE>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1">
<META content="MSHTML 5.50.4937.800" name=GENERATOR></HEAD>
<BODY>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=288094115-05082004>!!!</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=288094115-05082004>It
seems to work with the robot indexer instead of the index local.
</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=288094115-05082004>Still
to be fully tested. will provide informations soon.</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=288094115-05082004>question: why such ?</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT
face=Tahoma size=2>-----Original Message-----<BR><B>From:</B> SPIELMANN
Christophe <BR><B>Sent:</B> 05 August 2004 17:31<BR><B>To:</B>
'Egothor-tech@egothor.org'<BR><B>Subject:</B> RE: Egothor with Pdf parsing:
unable to find out a word despite it seems to be into the
barrel<BR><BR></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN class=226092915-05082004>I
forgot to specifed which commands i used: (NT4 with eclipse
)</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004>org.egothor.apps.Directory C:\\Dgpe\\Egothor_barrel
-lowercase -snippet C:\\Dgpe\\Egothor_from as <A
href="http://winold/manual/">http://winold/manual/</A></SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004>org.egothor.test.TankerQuery C:\\Dgpe\\Egothor_barrel
krav</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004>org.egothor.test.Dumper -DLWP
C:\\Dgpe\\Egothor_barrel\\1\\</SPAN></FONT></DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004></SPAN></FONT> </DIV>
<DIV><FONT face=Arial color=#0000ff size=2><SPAN
class=226092915-05082004>txs</SPAN></FONT></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV></DIV>
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left><FONT
face=Tahoma size=2>-----Original Message-----<BR><B>From:</B> SPIELMANN
Christophe <BR><B>Sent:</B> 05 August 2004 17:21<BR><B>To:</B>
'Egothor-tech@egothor.org'<BR><B>Cc:</B> CLAUS Pascal<BR><B>Subject:</B>
Egothor with Pdf parsing: unable to find out a word despite it seems to be
into the barrel<BR><BR></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2>We are facing a problem with
the <SPAN class=835400315-05082004>result of a </SPAN>pdf
parsing:<BR>here is our point:<SPAN class=835400315-05082004>( </SPAN><SPAN
class=835400315-05082004>We use egothor 1.2.5rc6/ JDK 1.4.1_02
)</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=835400315-05082004>Despite a
word ("krav") is into a pdf,</SPAN> <SPAN class=835400315-05082004>we
are not able to fetch it from a basic query.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=835400315-05082004>The
strange stuff is that we are able to find it using the Dumper or the Expand
command.</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Arial><FONT size=2><SPAN class=835400315-05082004>Any help
would be welcome.</SPAN></FONT></FONT></DIV>
<DIV><SPAN class=835400315-05082004><FONT face=Arial size=2>we provide the
logs below:</FONT></SPAN></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>When parsing one directory with
files:<BR>-------------------------------------------------------</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>- danish.pdf ( danish pdf )<BR>- site.pdf
(english pdf )<BR>- index.html ( english html )</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>we got after parsing (state file
)<BR>------------------------------------------------</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>#Tanker state<BR>#Thu Aug 05 16:59:03 CEST
2004<BR>slotter.last=1<BR>egothor.capacity=32<BR>slotter.flat=false<BR>egothor.slot.2=1</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>the log of the Directory command
is:<BR>-----------------------------------------------------<BR>...<SPAN
class=835400315-05082004>/..</SPAN><BR>Aug 5, 2004 4:58:44 PM
org.egothor.crusher.Finder scanPackages<BR>INFO:
<java.io.InputStream;15;java.io.Reader><BR>Switching lowercase to
true<BR>Switching Snippet support to true<BR>C:\DGPE\egothor_from as
</FONT><A href="http://winold/manual/"><FONT face=Arial
size=2>http://winold/manual/</FONT></A><BR><FONT face=Arial
size=2>danish.pdf<BR>Input<BR>java.lang.String<BR>Flags:
<FILENAME><PDF><BR>Output<BR>org.egothor.data.Document<BR>Flags:
<HOME><PUNCTUATION><LOWERCASE><SNIPPET><BR>Filtering
system found:<BR>--$0--> via
org.egothor.crusher.IniPath:java.lang.String<PDF><FILENAME>
--$1--> via
org.egothor.crusher.connectors.InputStreamPath:java.io.InputStream<BUFFERED><PDF><FILENAME>
--$21--> via
org.egothor.crusher.connectors.PDFPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS>
--$31--> via
org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS>
--$36--> via
org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS>
--$38--> via
org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS><LOWERCASE>
--$53--> via
org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><PDF><PUNCTUATION><HOME><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><LOWERCASE><BR>log4j:WARN
No appenders could be found for logger
(org.pdfbox.pdfparser.PDFParser).<BR>log4j:WARN Please initialize the log4j
system properly.<BR>index.html<BR>Input<BR>java.lang.String<BR>Flags:
<FILENAME><HTML><BR>Output<BR>org.egothor.data.Document<BR>Flags:
<HOME><PUNCTUATION><LOWERCASE><SNIPPET><BR>Filtering
system found:<BR>--$0--> via
org.egothor.crusher.IniPath:java.lang.String<HTML><FILENAME>
--$1--> via
org.egothor.crusher.connectors.ReaderPath:java.io.Reader<BUFFERED><HTML><FILENAME>
--$6--> via
org.egothor.crusher.connectors.HTML3Path:java.io.Reader<BUFFERED><HTML><SEMANTICS><FILENAME><NOHTMLTAGS>
--$16--> via
org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS>
--$21--> via
org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PUNCTUATION><HTML><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS>
--$23--> via
org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><HTML><PUNCTUATION><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><LOWERCASE>
--$38--> via
org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><PUNCTUATION><HTML><HOME><SEMANTICS><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><LOWERCASE><BR>site.pdf<BR>Input<BR>java.lang.String<BR>Flags:
<FILENAME><PDF><BR>Output<BR>org.egothor.data.Document<BR>Flags:
<HOME><PUNCTUATION><LOWERCASE><SNIPPET><BR>Filtering
system found:<BR>--$0--> via
org.egothor.crusher.IniPath:java.lang.String<PDF><FILENAME>
--$1--> via
org.egothor.crusher.connectors.InputStreamPath:java.io.InputStream<BUFFERED><PDF><FILENAME>
--$21--> via
org.egothor.crusher.connectors.PDFPath:java.io.Reader<BUFFERED><PDF><FILENAME><NOHTMLTAGS>
--$31--> via
org.egothor.crusher.connectors.TokenizerPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><TAGGED><FILENAME><NOHTMLTAGS>
--$36--> via
org.egothor.crusher.connectors.PunctPath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS>
--$38--> via
org.egothor.crusher.connectors.LowerCasePath:org.egothor.parser.Tokenizer<BUFFERED><PDF><PUNCTUATION><TAGGED><FILENAME><NOHTMLTAGS><LOWERCASE>
--$53--> via
org.egothor.crusher.connectors.BHTML2Path:org.egothor.data.Document<BUFFERED><PDF><PUNCTUATION><HOME><TAGGED><FILENAME><NOHTMLTAGS><SNIPPET><LOWERCASE><BR>Commit...<BR>...optimize()<BR>...commit()<BR>Done.<BR>Aug
5, 2004 4:58:55 PM org.egothor.dir.TankerImpl commit<BR>INFO: Saving
state</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>result of the query gives :
<BR>---------------------------------------<BR>Aug 5, 2004 4:59:31 PM
org.egothor.dir.TankerImpl loadState<BR>INFO: Loading state<BR>Query:
krav<BR>Aug 5, 2004 4:59:31 PM org.egothor.query.Executor query<BR>INFO:
[null:<WORD>krav r,p true,false]<BR>Aug 5, 2004 4:59:31 PM
org.egothor.dir.TankerImpl elements<BR>INFO: Dynamizer is dirty<BR>Aug 5,
2004 4:59:31 PM org.egothor.dir.TankerImpl elements<BR>INFO: Dynamizer is
dirty<BR>Aug 5, 2004 4:59:32 PM TermRunner constructor<BR>INFO:
setup<BR>0<BR><?xml version="1.0"
encoding="UTF-8"?><BR><query><group required="no"
prohibited="no" unknown="no" excluded="no"><term required="yes"
prohibited="no" unknown="no" excluded="no" value="&lt;WORD&gt;krav"
control="no" idf="1.001"
boost="1"/></group></query></FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>result of the Expand gives :
<BR>---------------------------------------<BR>C:/Dgpe/Egothor_barrel expand
of <WORD>kr*<BR>Aug 5, 2004 4:59:03 PM org.egothor.dir.TankerImpl
loadState<BR>INFO: Loading state<BR>Aug 5, 2004 4:59:03 PM
org.egothor.dir.TankerImpl elements<BR>INFO: Dynamizer is
dirty<BR><WORD>kraft<BR><WORD>krav<BR><WORD>kriterier<BR><WORD>kræver<BR>Aug
5, 2004 4:59:03 PM org.egothor.dir.TankerImpl commit<BR>INFO: Saving
state</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>result of Dumper
gives<BR>---------------------------------<BR>0 [PDF/PS] :
[http://winold/manual//danish.pdf] :CM\531576DA.doc PE 344.027 Or. EN DA DA
EUROPA-PARLAMENTET BUDGETUDVALGET Meddelelse til medlemmerne Om: Håndbog for
nye udvalgsmedlemmer GENERALDIREKTORATET FOR INTERNE POLITIKKER 3. juni 2004
PE 344.027 2/9 CM\531576DA.doc DA Indledning Europ<BR>1 Struts for
Transforming XML with XSL (stxx) [http://winold/manual//index.html] :the
stxx site stxx Home Getting Started About Index License Download Who we are
FAQ Changes Todo Site as PDF Getting Involved Contributing...<BR>2 [PDF/PS]
: [http://winold/manual//site.pdf] :stxx Documentation Table of contents 1.
About....................................................................................................................................
1 1.1. Struts for Transforming XML with XSL
(stxx).........................<BR><!VOLATILE>depthrank 3
org.egothor.store.disc.RankFileIn<BR>0 w=9 : <BR>1 w=9 : <BR>2 w=9 :
<BR><ACRONYM>e.g. 1 org.egothor.store.disc.IListFileIn<BR>2 w=1 :
3220<BR><APOSTROPHE>action's 1 org.egothor.store.disc.IListFileIn<BR>2
w=1 : 6005<BR><APOSTROPHE>apache's 1
org.egothor.store.disc.IListFileIn<BR>2 w=1 :
3934<BR>.../...<BR><WORD>korrekt 1
org.egothor.store.disc.IListFileIn<BR>0 w=4 : 2168<BR><WORD>kort 1
org.egothor.store.disc.IListFileIn<BR>0 w=14 : 338 1634
2342<BR><WORD>kraft 1 org.egothor.store.disc.IListFileIn<BR>0 w=4 :
110<BR><WORD>krav 1 org.egothor.store.disc.IListFileIn<BR>0 w=4 :
2590<BR><WORD>kriterier 1 org.egothor.store.disc.IListFileIn<BR>0 w=4
: 668<BR><WORD>kræver 1 org.egothor.store.disc.IListFileIn<BR>0 w=4 :
401<BR><WORD>kun 1 org.egothor.store.disc.IListFileIn<BR>0 w=42 : 373
421 694 1115 1369 1891 1932 2193 2426<BR><WORD>kunne 1
org.egothor.store.disc.IListFileIn<BR>../...</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2>Christophe Spielmann</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV><FONT face=Arial size=2></FONT> </DIV>
<DIV> </DIV></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>