[Egothor-tech] Duplicate default pages
Leo Galambos
Leo.Galambos at egothor.org
Fri Apr 1 12:48:13 BST 2005
Stuart David Lewis [sdl] wrote:
>>>http://www.aber.ac.uk/en/student/travel/
>>>http://www.aber.ac.uk/en/student/travel/index.php
>>>
>>>Could Egothor be improved to know that these are the same?
>>>
>>>
>
>
>
>>Unfortunately, you could only exclude the identical pages manually (in
>>
>>
>robot's rules file).
>
>How would I go about doing this? I wouldn't want to exclude pages ending
>in '/' or one of the many default pages (default.asp, default.shtml,
>default.shtml, index.php, index.html, index.asp, index.aspx, index.wml
>to name but a few) as some pages may only be referred to by one of
>these.
>
>Could I use a rewrite rule at all, or would this be tricky because of
>the number of possible index pages?
>
>
Hi,
I think you could use AntiSpam feature which is implemented in Oracul
(see AntiSpam topic in twiki -
http://www.egothor.org/download/twiki.tar.gz).
Cheers,
Leo
==org.egothor.robot.apps.Oracul index/ -spam my_table -linksrank
linksrank/==
It will apply =linksrank/= values onto =index/=. Moreover, =my_table= is
read and may modify the values. The format of the table is as follows:
<verbatim>
# rank is set to 0
domain www.badboyz.example.com:80 = 0
# rank of all documents from the server: decreased by 2
domain www.badboyz.example.com:80 - 2
# you can also increase the values
domain www.goodboys.example.com:80 + 2
# the same can be applied on specific URLs
url http://www.badboyz.example.com:80/stupidpage.html = 0
url http://www.goodboys.example.com:80/greatresource.html + 5
</verbatim>
If a =domain= rule matches, then =url= rules are not scanned.
In case of a collision, the last rule is applied. In the example
=www.badboyz.example.com:80 - 2= is only applied,
=www.badboyz.example.com:80 = 0= is skipped over.
--
Leo Galambos
Faculty of Mathematics and Physics, DSE
Malostranske namesti 25
Prague 1
CZE
http://kocour.ms.mff.cuni.cz/~galambos/
More information about the Egothor-tech
mailing list