Skip navigation links

Package org.egothor.robot

Robot Capek

See: Description

Package org.egothor.robot Description

Robot Capek

Fully asynchronic robot.

List of additions

11.0c
FastBlindAppender is default in T0
11.0a
BlockingArray included to replace bad coded pipes between T*-es and allowing contra-processing of Responses between T5 and TX-es, MAXFAILURESINBATCH added to eliminate targets with a poor network connection, 5 helpers in T5, TX-es process() synchronized now, T0 with fine-grain RW lock
10.0d
New URIs saved in longer intervals, rework of several synchronized sections, HTTP connection in WRITE state when the socket is connected, statistics chunk added
10.0a
Berkeley DB repository for Uri-id map
9.0g
RTP updated with ABORT
9.0e
RTP updated, attributes send along, IP limit for RTP clients
9.0a
Transmitter of HTML parser SAX events
8.0h
Direct ByteBuffer in HTTPAnalyzer, Content-type backup table, external Escape holes
8.0g
User-agent that will not be denied by the stupid anti-bot servers outthere
8.0f
seed (keyword) in a configuration file
8.0c
Table of IMG-SRC URIs has separate configure params (possible OutOfMemory when increasing the previous shared params of Table used for A-HREF URIs)
8.0b
Alternative text of IMG-SRC/ALT is written into the structure file of images
8.0a
Structure and list of URIs are saved
7.0n
Scan for img-src links
7.0l
New URIs are not written to a log, Reporter is used instead
Content-type is not logged now
7.0i
Logging changed (new URIs and their uids are logged, while running URIs are not logged)
7.0h
Bucket size 8*1024B, 3-level prefix in HashURI directory, URIsCache 50000, NodesCache 1000
7.0g
Content-type is logged
5.2j
addMoreURIs flag via PROC/stopAdd flag
5.2i
TurnUpWheel added
5.2h
automatic brake added, both TXs parse document in parallel, fixes in SequentialQueue, MAX_SORTED lowered
5.2g
Server neednot allocate Page for pushing, HashURI solve the issue of its cache (My2Longs instead of transformation to a single Long)
5.2f
Config.SAVEINTERVAL added, number of Page/buffers limited by Config.FETCHQSIZE, cnt_oknew counter added, MAXURILEN added (default:512), DNSNAMELEN also counted in validation of URIs

List of bugs

11.0c
Servers stayed in T0 inMem and caused the robot to stuck with "waiting for an empty server slot".
11.0a
Base href, e.g. http://a.a.a, was not read correctly as http://a.a.a/. It then created wrong links. T0.saveNewURIs is not called almost always when we are waiting for an empty server slot.
10.0e
Robot was not able to stop correctly under load (TX tasks still wanted to add more URIs, but DB was already closed by T0)
10.0d
all.tpr value in status file was computed incorrectly
8.0h
java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkIndex(Buffer.java:438)
        at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:209)
        at org.egothor.robot.HTTPByteBufferAnalyzer.parse(HTTPByteBufferAnalyzer.java:254)
        at org.egothor.robot.module.Server.processCursor(Server.java:183)
        at org.egothor.robot.module.T5.run(T5.java:218)
            
8.0f
T0 still loads new Servers into the robot
8.0e
Charsets recognized incorrectly (Response object)
8.0d
Some servers stay in a queue on shutdown.
8.0c
java.lang.NullPointerException
        at java.net.URI.encode(URI.java:2693)
        at java.net.URI.toASCIIString(URI.java:1614)
        at org.egothor.robot.module.T0$MyReporter.print(T0.java:50)
        at org.egothor.robot.module.T0.push(T0.java:151)
        at org.egothor.robot.module.LinksExtractor.process(LinksExtractor.java:48)
        at org.egothor.robot.module.T5.process(T5.java:310)
        at org.egothor.robot.module.T5.run(T5.java:240)
            
7.0n
port range can be outside the reasonable boundary 1-65535, T3$T4::submit then throws IllegalArgumentException
7.0l
ArrayIndexOutOfBoundsException not handled in HTTPParser::parse.
7.0k
Older index chunks were overwritten on restart - fixed.
7.0j
2.5.2006 - [Neal Thomsen] robots.txt is misinerpreted, block of user-agent(s)
7.0h
4.4.2006 - HTTPAnalyzer dal SEVERE exception ArrayIndexOutOfBounds na readCRLF:153 z parse:250
7.0e
5.2.2006 - rotate failed (was called with the null param and it was read as an empty robots.txt, but it could be read as "robots.txt was not changed since last time")
7.0b
27.1.2006 - length of data block (HTTP header) is longer than real data block sent - status set to -1 now
7.0a
24.1.2006 - !mayQueue items/Servers added to swamp which caused that isEmpty neednot be right (consult with rotate)
5.2i
URI with userInfo saved onto the hostname queue which produced collisions
5.2g
1.8.2005 - Server rotate failed
5.2f
29.7.2005 - buffer limit set incorrectly
5.2e
28.7.2005 - hostname is not always in lowercase, HashURI - bug in its cache sub-module
5.2d
28.7.2005 - cannot be stopped
5.2c
28.7.2005 - html.head.meta.name was not converted to lower-case and it caused that the robots specification was not handled properly, example: <meta name="RoBoTs" ....
5.2b
27.7.2005 - nextConn rooted at 0, not the current time or lastIO
Skip navigation links

Copyright © 2016 Egothor. All Rights Reserved.