The next version (v2) was partially developed by students of the Faculty of Mathematics and Physics, Charles University in Prague. They implemented a couple of interesting components, and tested the system in several specific configurations.

  • New dynamization algorithm for fast index updating
  • Transactions (ACID)
  • Plagiarism detection
  • Incremental updates
  • Able to recognize the most familiar file formats: HTML, PDF, PS, and Microsoft's DOC, and XLS
  • Based on the extended Boolean model which can operate as the Vector or Boolean models
  • Universal stemmer that processes any language


Egothor1 tried to solve an issue of fragments related to index maintenance. It also developed a universal stemmer that is able to process any language.

This project continued as Egothor2...


After more than ten years, the former project is leaving the area of full-text search engines. The next version, Egothor3, will be a novel universal database system - Anamorphic Database.

The software is already under heavy development, but the activity is interrupted frequently...due to other interesting projects.


Have you already tried to locate an IP address?

Geolocation is cool. Anyone would like to know the postal address of attacking hackers or interesting visitors of your web site. There are some solutions, but the complexity of routing processes makes this problem attractive - if you like hard problems.

Galeo can locate the target IP address using a novel algorithm - the output is the most probable areas where the target may be deployed.

Galeo is unique

  • Assumes that the network is not homogenous, while other algorithms employ the constant speed of data-flow approx. 100km/msec
  • You can follow the complete calculation process, so that you know what and why the algorithm does
  • You can fix the sub results to improve the final result

Galeo reads more input

  • Direct GPS coordinates of some routers
  • Probable positions of some routers: you may specify unlimited number of towns where you suppose routers' locations
  • List of areas annotated by population density, etc.

Stop crawling, deploy virtual entities!

There are dozens of crawlers. Some of them are high-performance systems, or have interesting features for deep-web crawling. Bobo is something completely different. Bobo is universal crawling architecture to support any crawler you can think of. Deep-web crawler, classic web crawler, worm, virtual entities ... and all the crawlers are hosted inside Bobo. Virtual army in action.

Have you ever thought about your own army of web surfers to collect or disseminate anything you need?

Virtual army

Bobo project started as a classic distributed crawler for Egothor2. The novel approach of implementation created devil's software. Bobo can support unlimited number of virtual (id)entities to do anything for you...


Build distributed applications effectively

J5m is easy-to-use replacement for JINI. It was tested as an underlying platform for a distributed crawler and it can offer production-ready stability and features. It guarantees the following operations:

  • Location of remote objects.
  • Communication with remote objects: RMI is used as a primary protocol for client-server communication. The protocol may be replaced by anything else (JMS for instance), if required.
  • Optimization of remote calls: the client's connections are profiled, so that a client will communicate with faster server objects preferably.
  • Server object loading and initialization: a programmer may define a tree of dependencies among deployed server objects; middleware will start the server objects in right order and pass them some initialization parameters.

J5m is still a thin layer for distributed applications. J5m is not (yet?) a complex middleware with all funny bells and whistles.

Why another middleware?

Bobo crawler is based on a new approach of ``co-operating services''. It allows us to exchange any part of the system without shutdown or restart or reload, any compatible part of the system can also process requests of another component, if such a component fails or is overhauled. Not all applications need the similar features...the point is that we did not try to develop ``another middleware'', we try to develop ``something completely different''.