About

Bobo is a platform to run and support distributed crawlers or virtual entities. It allows you to start unlimited number of independent crawlers or manage unlimited number of virtual entities.

How is the system orchestrated? A user passes his request to the base system entry interface. The task has a form: I would like to gather some part of webspace starting at URIs X, Y, Z; as a bot named N; between (time interval) FROM-TO; the gathered documents will be stored in cabinet C. As soon as the request is accepted and started, the web content is routed into the respective cabinets (each of the tasks can specify different cabinet C), and new implicit goals discovered under web links are sent to the planner. The cabinet can be processed via a user's program that iterates the cabinet's content, or a tool which saves the documents into a document server with some user's crentials.

Schema of processes and entities

The number of cabinets is unlimited by default. Any user can run his own cabinet, even on his personal PC at his desk (if the PC has j5m middleware installed and running). If the required cabinet is not running, or is inaccessible, then the documents are stored to nowhere. As soon as the cabinet comes back to life, it starts to accept the documents immediately - no administration actions are neccessary. The dynamic behaviour is native product of the underlaying middleware, and you may encounter it amongst other components as well.