Anti SPAM block
In
org.egothor.robot.apps.Oracul you can apply an anti-SPAM table using the parameter
-spam:
org.egothor.robot.apps.Oracul index/ -spam my_table -linksrank linksrank/
It will apply
linksrank/ values onto
index/. Moreover,
my_table is read and may modify the values. The format of the table is as follows:
# this is a comment
# all documents from the server has set their value to 0
# (spaces are required)
domain www.badboyz.example.com:80 = 0
# all documents from the server has set their value decreased by 2
domain www.badboyz.example.com:80 - 2
# or you can also increase the values
domain www.goodboys.example.com:80 + 2
#
# the same can be applied on specific URLs
url http://www.badboyz.example.com:80/stupidpage.html = 0
url http://www.goodboys.example.com:80/greatresource.html + 5
If a
domain rule matches, then
url rules are not scanned.
In case of a collision, the last rule is applied. In the example
www.badboyz.example.com:80 - 2 is only applied,
www.badboyz.example.com:80 = 0 is skipped over.
The table is implemented using
java.util.HashMap, so the table can be as complex as you like - no performance bottleneck should be encountered. By the way, the table is only applied, when document metadata are of type
indexer.html2.HTMLMetadata.