r1 - 12 Apr 2004 - 01:46:45 - LeoGalambosYou are here: TWiki >  Egothor Web  > DistributedRobot

Distributed Robot

The robot Capek can run in two modes

  1. local (switch -local) and
  2. distributed (switch -remote).

The first mode is effective when you want to crawl a small network (less than 200 hosts). If your network is larger, you might encounter that the robot runs slower and slower. This behaviour is caused by the fact, that robot's caches are full and do not help so much. Another issue is that robot must use hard disk for its data and, as a result, it may close and open the respective files too often. This also causes degradation of the overall speed.

In such a case the solution is a distributed robot which starts each of the robot's components in a separate process. The processes can use more file handles, more memory etc. Therefore, the speed should be better.

Release 1.2.6 allows you to run several Clients, but other components must run in a single instance only. It is planned to support several Schedulers or Managers, but the respective RFE was not made yet.

To start the robot in a distributed mode, change your current directory to an empty directory. Copy your RulesFile here and enter these commands.

On Linux:

# start Corpus component
java org.egothor.robot.remote.CorpusImpl -i &
# start Scheduler component
java org.egothor.robot.remote.SchedulerImpl -i &
# start Manager component
java org.egothor.robot.remote.ManagerImpl -i &
# start (at least one) client
java org.egothor.robot.Capek -remote -inject http://my.starting.url/ &
On Windows:
# start Corpus component
start java org.egothor.robot.remote.CorpusImpl -i
# start Scheduler component
start java org.egothor.robot.remote.SchedulerImpl -i
# start Manager component
start java org.egothor.robot.remote.ManagerImpl -i
# start (at least one) client
start java org.egothor.robot.Capek -remote -inject http://my.starting.url/

Later, you can still use the data structures prepared by a distributed robot for the local robot. The same holds in a reverse direction, if you are not satisfied with the performance of a local robot, you can start up the distributed robot.

To stop the robot, first stop the client(s):

telnet 127.0.0.1 9713
> shutdown

Now, you can stop the components:

java org.egothor.robot.remote.ManagerImpl -u
java org.egothor.robot.remote.SchedulerImpl -u
java org.egothor.robot.remote.CorpusImpl -u

For more details, see ManagerImpl, SchedulerImpl, CorpusImpl.

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback