Anti SPAM block In org.egothor.robot.apps.Oracul you can apply an anti SPAM table using the parameter spam : org.egothor.robot.apps.Oracul index/ spam my table ...
Append utility The Append utility inserts a tanker to another. For instance, you have one index in /tmp/indexA, and the second one in /tmp/indexB/: java org.egothor ...
Capek When Capek must connect to a remote Manager at uri , use this: java org.egothor.robot.Capek remote m uri inject uri ... The default is a local run ...
Egothor's Model and Algorithms The model and algorithms are based on many papers published in past decade. RobotModel EngineModel describes the core elements ...
Corpus Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.CorpusImpl i a uri l dir uninstallation is achieved ...
Crusher To print out all available connectors, issue java org.egothor.crusher.Finder The output will look like this Apr 11, 2004 3:41:52 PM org.egothor.crusher.Finder ...
Directory To index data which is on your local disks, issue java org.egothor.apps.Directory index directory csdia charset CHARSET lowercase phonetics ...
Problem How can I verify the structure of metadata in a barrel? How can I fix it when it is corrupted? Solution Use the tool DiscIndexData: $java org.egothor.db.disc ...
Distributed Robot The robot Capek can run in two modes 1 local (switch local ) and 2 distributed (switch remote ). The first mode is effective when you ...
Dumper utility If you want to dump out the content of a barrel, you can use org.egothor.test.Dumper program. $ java org.egothor.test.Dumper /tmp/txt/index/1/ 0 {L ...
Egothor 1.x Egothor 1.x is still actively developed at SourceForge's project page with some members of the original team. You can access the former TWiki topics which ...
Environment The hit list is produced from 2 term query it shows I/O performance together with speed of evaluation formula. The total length of the list is about ...
Core Model of the Engine The core data structure of EGOTHOR is a barrel that is denoted as B for the following thoughts. A barrel is an autonomous full text index ...
Tanker Query ExecuteXML To solve a query in a tanker, use java org.egothor.query.ExecuteXML tanker dir query If you set egothor.query.repeats to a value greater ...
Expand utility The Expand utility prints out a list of terms of a tanker. For instance, the tanker is stored in /tmp/index and you want to print out all words (terms ...
Fsck To fix corrupted data structures of a robot, change your current directory to the directory with "scheduler" and "corpus" directories and isuue the following ...
Problem How can I index via API? Solution import org.egothor.dir.TankerImpl; import org.egothor.parser.Tokenizer; import org.egothor.data. ; import org.egothor.parser ...
Known Identifications of Capek 1 capek/3.0b Egothor Developers, available in 1.2.5 RC V, 1 xpider/0.1a xDefine.com, based on capek/3.0b, http://www.xdefine ...
Lucene's Issue The issue is demonstrated by the following test: document base is organized in groups of 2000 documents each group is indexed added to the ...
Manager Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.ManagerImpl i a uri l dir c uri s uri uninstallation ...
Michelangelo Michelangelo creates or updates an index for pages crawled by Capek. The tool reads these parameters: nocsdia which switches off transformation ...
Names of the Core Objects We mention how the properties and names of barrels were arrived at. We discuss the three most used barrels a classic barrel, tanker and ...
The two most widely used retrieval effectiveness (quality) measures are precision (P) and recall (R) as defined below: P #relevant documents returned / #documents ...
If you are not familiar with the TWiki collaboration tool, please visit .WelcomeGuest in the TWiki. web first. The TWiki.Know knowledge base has the following functions ...
Benchmark of the robot on .cz This test failed due to insufficient disk space. Therefore, there is not any detailed view to configuration and statistics. Fortunately ...
Formáty datových souborů Robota Všechna celá čísla jsou ukládána v Pack7 formátu. Řetězce jsou ukladány v Pascal notaci (délka následovaná odpovídajícím počtem byt ...
Robot The robot consists of 5 major components: 1 Corpus 1 Scheduler 1 Link DB 1 Manager 1 Client Michelangelo is delta indexer (it is not a part of ...
Scheduler Remote Implementation To install the object into your RMI registry java org.egothor.robot.remote.SchedulerImpl i a uri l dir uninstallation is achieved ...
Problem How can I search an index via API? Solution import org.egothor.dir.TankerImpl; import org.egothor.query.Executor; import org.egothor.data. ; public class ...
Version info utility To print out the version info, issue this command java org.egothor.util.Service The output will look like this Egothor Developers/1.2.5 RC III ...
Problem How can I exclude session cookies from URL in the robot? Solution Use this "rules" file: user agent http://www.egothor.org/nonexistentpage.html loop ...
Category of a TWiki.Egothor topic The bottom part of a TWiki.Egothor topic has a category table. It is used to categorize a topic: UseCategory: Tell if the category ...
Tanker Query To solve a query in a tanker, use java org.egothor.test.TankerQuery tanker dir query If you set egothor.query.repeats to a value greater than 1, the ...
TopicClassification Used to classify a topic in TWiki.Egothor. Select one of: NoDisclosure: Confidential No disclosure outside the company PublicFAQ: ...
Robot UDP Transmitter Protocol The protocol sends all SAX events of HTML parser. Protocol UDP server port port s 1024 Initialization and run 1. Client ...
Under stemming ...counts words that should be transformed to the same group but are not. For instance, running is transformed to run , but ran , instead ...
The person who runs the robot did not configured his robot properly. The Egothor team is not responsible for such crawling sessions. Robot specification robots ...
Problem How can I remove (update) anything in the index? Solution import org.egothor.dir.TankerImpl; import org.egothor.parser.Tokenizer; import org.egothor.data ...
Virtual (web) spaces Example: See Demo Webspace combo. Egothor's Virtspace module allows creating sections of pages on your web servers and search in them. Virtspace ...
Egothor 2.x Still under heavy development Project homepage Robot data files structure (czech) Transmitter protocol Robot testbed Egothor 1.x original ...
Egothor Web Preferences The following settings are web preferences of the Egothor web. These preferences overwrite the site level preferences in and , and ...
Who We Are This page lists all of the people who have gone the extra mile and are committers or members of the project. If you would like to get involved, the first ...