r2 - 17 Dec 2006 - 02:41:56 - LeoGalambosYou are here: TWiki >  Egothor Web  >  CoreModel > RobotModel

Robot

robot.png

The robot consists of 5 major components:

  1. Corpus
  2. Scheduler
  3. Link DB
  4. Manager
  5. Client

Michelangelo is delta indexer (it is not a part of the robot).

Corpus

This component is able to

  • load and save gathered documents,
  • allocate a new identificator to URI,
  • find the identificator assigned to some URI,
  • find URI related to an ID.

If a new URI comes to the system, it gets a new unique ID. This identificator is used instead of the long string representation in other components.

Scheduler

This component is able to

  • schedule IDs on a time axis,
  • retrieve ID which should be processed.

Link DB

Link DB only keeps track of all links among pages - this DB is managed for computation of various semantic ranks.

Manager

This is the central component which communicates with Corpus and Scheduler. It is able to

  • accept URIs discovered by crawling processes - New URIs are transformed to internal IDs in Corpus. If such URIs are new, their IDs are sent to Scheduler.
  • retrieve URIs which should be processed by a crawling process - IDs are popped from Scheduler, then they are translated to Strings via Corpus and sent to crawling processes.
  • save Responses (documents) - they are simply routed to Corpus.

Client

This component connects to manager and retrieves URIs which should be gathered. The Responses are sent back to Manager.

Summary

The system can run unlimited number of Clients. (With some rework in Manager) it is possible to run unlimited number of Schedulers, Corpuses and Managers. It will ensure that the system can scale well on large domains.

toggleopenShow attachmentstogglecloseHide attachments
Topic attachments
I Attachment Action Size Date Who Comment
pngpng robot.png manage 8.6 K 11 Apr 2004 - 18:49 LeoGalambos Robot Architecture
Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r2 < r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback