[Egothor-tech] PDF & DOC indexing, Egothor 1.3.003
Leo Galambos
leo.galambos at mff.cuni.cz
Wed Oct 5 13:12:00 BST 2005
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Filip Koczorowski wrote:
| Egothor 1.3.003, a clean download from sourceforge.net, seems to
| have some trouble with indexing PDF and DOC documents. I prepared a
| simple page with one line of text "This is a testing page for
| Egothor" and two links - one in a form of "a href=test.pdf" and
| another one as "a href=test.doc". Both of these files contain a
| header "This is a testing page for Egothor" and a single paragraph
| of text (files created with OpenOffice 2.0beta).
|
| After I run Capek, I get a "corpus" folder that contains the HTML
| page and the PDF & DOC files. However when I run Michelangelo, the
| resulting index has no information from PDF & DOC. I looked into
| "doc.dta" file in "index" folder and it contains HTML page content
| as well as "test.pdf" file name inside, but nothing else (no
| content of PDF nor any sign of DOC).
|
| I would appreciate any suggestions - perhaps I am doing something
| wrong...
Hi,
could you send the data files to my private email box, please? I will
have to peek at it closer ;)
THX
Leo
- --
Leo Galambos
Faculty of Mathematics and Physics, DSE
Malostranske namesti 25
Prague 1
CZE
http://kocour.ms.mff.cuni.cz/~galambos/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFDQ8MQBBOhx7G4BXcRAs0vAJ4vZuWaVYiHWqmpXiWLeNbcyrWuCQCcDiE7
0xkR1OdMJDsNvEfdeElOGP8=
=rXc2
-----END PGP SIGNATURE-----
More information about the Egothor-tech
mailing list