[Egothor-tech] PDF & DOC indexing, Egothor 1.3.003

Filip Koczorowski filipk at man.poznan.pl
Thu Sep 22 14:28:15 BST 2005


Egothor 1.3.003, a clean download from sourceforge.net, seems to have 
some trouble with indexing PDF and DOC documents. I prepared a simple 
page with one line of text "This is a testing page for Egothor" and two 
links - one in a form of "a href=test.pdf" and another one as "a 
href=test.doc". Both of these files contain a header "This is a testing 
page for Egothor" and a single paragraph of text (files created with 
OpenOffice 2.0beta).

After I run Capek, I get a "corpus" folder that contains the HTML page 
and the PDF & DOC files. However when I run Michelangelo, the resulting 
index has no information from PDF & DOC. I looked into "doc.dta" file in 
"index" folder and it contains HTML page content as well as "test.pdf" 
file name inside, but nothing else (no content of PDF nor any sign of DOC).

I would appreciate any suggestions - perhaps I am doing something wrong...

-- 
Filip Koczorowski


-------------- next part --------------
A non-text attachment was scrubbed...
Name: filipk.vcf
Type: text/x-vcard
Size: 222 bytes
Desc: not available
Url : http://egothor.org/pipermail/egothor-tech/attachments/20050922/84a9ba76/filipk.vcf


More information about the Egothor-tech mailing list