[Egothor-tech] PDF & DOC indexing, Egothor 1.3.003
Filip Koczorowski
filipk at man.poznan.pl
Thu Sep 22 14:28:15 BST 2005
Egothor 1.3.003, a clean download from sourceforge.net, seems to have
some trouble with indexing PDF and DOC documents. I prepared a simple
page with one line of text "This is a testing page for Egothor" and two
links - one in a form of "a href=test.pdf" and another one as "a
href=test.doc". Both of these files contain a header "This is a testing
page for Egothor" and a single paragraph of text (files created with
OpenOffice 2.0beta).
After I run Capek, I get a "corpus" folder that contains the HTML page
and the PDF & DOC files. However when I run Michelangelo, the resulting
index has no information from PDF & DOC. I looked into "doc.dta" file in
"index" folder and it contains HTML page content as well as "test.pdf"
file name inside, but nothing else (no content of PDF nor any sign of DOC).
I would appreciate any suggestions - perhaps I am doing something wrong...
--
Filip Koczorowski
-------------- next part --------------
A non-text attachment was scrubbed...
Name: filipk.vcf
Type: text/x-vcard
Size: 222 bytes
Desc: not available
Url : http://egothor.org/pipermail/egothor-tech/attachments/20050922/84a9ba76/filipk.vcf
More information about the Egothor-tech
mailing list