r1 - 04 Apr 2004 - 14:35:06 - LeoGalambosYou are here: TWiki >  Egothor Web  >  EgothorFAQ > SessionsInURL

Problem

How can I exclude session cookies from URL in the robot?

Solution

Use this "rules" file:

user-agent      +http://www.egothor.org/nonexistentpage.html
loop            2

valid           http://www\.egothor\.org(:\d+)?/.*
replace         (\?|&)\w+=[^&\?]{8,}($|&)      <EMPTY>
replace         ;\w+=[^&\?]{8,}(\?|$)          <EMPTY>
replace         &&                             <EMPTY>
replace         \?&                            \?
replace         \?$                            <EMPTY>
it removes all cookies from URL, allowing to crawl www.egothor.org.

-- LeoGalambos - 04 Apr 2004

WebForm?
TopicClassification PublicFAQ
OperatingSystem

OsVersion

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r1 | More topic actions
 
Powered by TWiki
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback