Distributed Robot
The robot
Capek can run in two modes
- local (switch
-local) and
- distributed (switch
-remote).
The first mode is effective when you want to crawl a small network (less than 200 hosts). If your network is larger, you might encounter that the robot runs slower and slower. This behaviour is caused by the fact, that robot's caches are full and do not help so much. Another issue is that robot must use hard disk for its data and, as a result, it may close and open the respective files too often. This also causes degradation of the overall speed.
In such a case the solution is a distributed robot which starts each of the
robot's components in a separate process. The processes can use more file handles, more memory etc. Therefore, the speed should be better.
Release 1.2.6 allows you to run several Clients, but other components must run in a single instance only. It is planned to support several Schedulers or Managers, but the respective RFE was not made yet.
To start the robot in a distributed mode, change your current directory to an empty directory. Copy your
RulesFile here and enter these commands.
On Linux:
# start Corpus component
java org.egothor.robot.remote.CorpusImpl -i &
# start Scheduler component
java org.egothor.robot.remote.SchedulerImpl -i &
# start Manager component
java org.egothor.robot.remote.ManagerImpl -i &
# start (at least one) client
java org.egothor.robot.Capek -remote -inject http://my.starting.url/ &
On Windows:
# start Corpus component
start java org.egothor.robot.remote.CorpusImpl -i
# start Scheduler component
start java org.egothor.robot.remote.SchedulerImpl -i
# start Manager component
start java org.egothor.robot.remote.ManagerImpl -i
# start (at least one) client
start java org.egothor.robot.Capek -remote -inject http://my.starting.url/
Later, you can still use the data structures prepared by a distributed robot for the local robot. The same holds in a reverse direction, if you are not satisfied with the performance of a local robot, you can start up the distributed robot.
To stop the robot, first stop the client(s):
telnet 127.0.0.1 9713
> shutdown
Now, you can stop the components:
java org.egothor.robot.remote.ManagerImpl -u
java org.egothor.robot.remote.SchedulerImpl -u
java org.egothor.robot.remote.CorpusImpl -u
For more details, see
ManagerImpl,
SchedulerImpl,
CorpusImpl.