Egothor 2.x Still under heavy development Project homepage Robot data files structure (czech) Transmitter protocol Robot testbed Egothor 1.x original ...
Robot UDP Transmitter Protocol The protocol sends all SAX events of HTML parser. Protocol UDP server port port s 1024 Initialization and run 1. Client ...
Formáty datových souborů Robota Všechna celá čísla jsou ukládána v Pack7 formátu. Řetězce jsou ukladány v Pascal notaci (délka následovaná odpovídajícím počtem byt ...
Capek When Capek must connect to a remote Manager at uri , use this: java org.egothor.robot.Capek remote m uri inject uri ... The default is a local run ...
Egothor 1.x Egothor 1.x is still actively developed at SourceForge's project page with some members of the original team. You can access the former TWiki topics which ...
Robot The robot consists of 5 major components: 1 Corpus 1 Scheduler 1 Link DB 1 Manager 1 Client Michelangelo is delta indexer (it is not a part of ...
Category of a TWiki.Egothor topic The bottom part of a TWiki.Egothor topic has a category table. It is used to categorize a topic: UseCategory: Tell if the category ...
Directory To index data which is on your local disks, issue java org.egothor.apps.Directory index directory csdia charset CHARSET lowercase phonetics ...
Tanker Query ExecuteXML To solve a query in a tanker, use java org.egothor.query.ExecuteXML tanker dir query If you set egothor.query.repeats to a value greater ...
Environment The hit list is produced from 2 term query it shows I/O performance together with speed of evaluation formula. The total length of the list is about ...
Benchmark of the robot on .cz This test failed due to insufficient disk space. Therefore, there is not any detailed view to configuration and statistics. Fortunately ...
Known Identifications of Capek 1 capek/3.0b Egothor Developers, available in 1.2.5 RC V, 1 xpider/0.1a xDefine.com, based on capek/3.0b, http://www.xdefine ...
Fsck To fix corrupted data structures of a robot, change your current directory to the directory with "scheduler" and "corpus" directories and isuue the following ...
Who We Are This page lists all of the people who have gone the extra mile and are committers or members of the project. If you would like to get involved, the first ...
Egothor's Model and Algorithms The model and algorithms are based on many papers published in past decade. RobotModel EngineModel describes the core elements ...
Problem How can I search an index via API? Solution import org.egothor.dir.TankerImpl; import org.egothor.query.Executor; import org.egothor.data. ; public class ...
Problem How can I index via API? Solution import org.egothor.dir.TankerImpl; import org.egothor.parser.Tokenizer; import org.egothor.data. ; import org.egothor.parser ...
Problem How can I verify the structure of metadata in a barrel? How can I fix it when it is corrupted? Solution Use the tool DiscIndexData: $java org.egothor.db.disc ...
Problem How can I remove (update) anything in the index? Solution import org.egothor.dir.TankerImpl; import org.egothor.parser.Tokenizer; import org.egothor.data ...
Core Model of the Engine The core data structure of EGOTHOR is a barrel that is denoted as B for the following thoughts. A barrel is an autonomous full text index ...
Names of the Core Objects We mention how the properties and names of barrels were arrived at. We discuss the three most used barrels a classic barrel, tanker and ...
Dumper utility If you want to dump out the content of a barrel, you can use org.egothor.test.Dumper program. $ java org.egothor.test.Dumper /tmp/txt/index/1/ 0 {L ...
Expand utility The Expand utility prints out a list of terms of a tanker. For instance, the tanker is stored in /tmp/index and you want to print out all words (terms ...
Virtual (web) spaces Example: See Demo Webspace combo. Egothor's Virtspace module allows creating sections of pages on your web servers and search in them. Virtspace ...
Lucene's Issue The issue is demonstrated by the following test: document base is organized in groups of 2000 documents each group is indexed added to the ...
Crusher To print out all available connectors, issue java org.egothor.crusher.Finder The output will look like this Apr 11, 2004 3:41:52 PM org.egothor.crusher.Finder ...
Version info utility To print out the version info, issue this command java org.egothor.util.Service The output will look like this Egothor Developers/1.2.5 RC III ...
Tanker Query To solve a query in a tanker, use java org.egothor.test.TankerQuery tanker dir query If you set egothor.query.repeats to a value greater than 1, the ...
Scheduler Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.SchedulerImpl i a uri l dir uninstallation is achieved ...
Manager Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.ManagerImpl i a uri l dir c uri s uri uninstallation ...
Corpus Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.CorpusImpl i a uri l dir uninstallation is achieved ...
Michelangelo Michelangelo creates or updates an index for pages crawled by Capek. The tool reads these parameters: nocsdia which switches off transformation ...
Append utility The Append utility inserts a tanker to another. For instance, you have one index in /tmp/indexA, and the second one in /tmp/indexB/: java org.egothor ...
Problem How can I exclude session cookies from URL in the robot? Solution Use this "rules" file: user agent http://www.egothor.org/nonexistentpage.html loop ...