[Egothor-tech] RE: Q: Egothor on Win98SE with PHP
Stuart David Lewis [sdl]
sdl at aber.ac.uk
Thu Jun 23 14:15:22 BST 2005
Hi Henry,
At UWA we have two installations of Egothor:
- http://www.aber.ac.uk/en/search/
- Linux/Apache/PHP -> Tomcat
- http://www.inf.aber.ac.uk/is-search/
- Windows Server/IIS/Tomcat connector
So I can help you on both counts, but can't show you the whole thing
working as you want!
> Now it seems that I'm unable to set it all to the right values in the
> batch files. I get error messages... Especially how to set the
CLASSPATH right.
To rebuild, we have two copies of Egothor in directories called:
- inf.aber.ac.uk
- inf.aber.ac.uk-build
We then have a scheduled task that calls rebuild.bat:
REM
REM This file rebuilds the index
REM
echo Step 1: Delete the old index files
echo ==================================
cd www.inf.aber.ac.uk-build
rmdir /S /Q corpus
rmdir /S /Q delta
rmdir /S /Q index
rmdir /S /Q linkdb
rmdir /S /Q newbits
rmdir /S /Q ranks
rmdir /S /Q scheduler
echo Done! (Step 1 of 5)
echo
echo Step 2: Re-crawl the site
echo =========================
call INIT.bat
set WWW=http://www.inf.aber.ac.uk/
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
-Degothor.server.pause=50ms org.egothor.robot.Capek %WWW%
echo Done! (Step 2 of 5)
time /T
echo
echo Step 3: Re-index the site
echo =========================
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
-Dvirtspace.table=webspaces org.egothor.apps.Michelangelo
echo Done! (Step 3 of 5)
time /T
echo
echo Step 4: Re-calculate the page ranks the site
echo ============================================
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
org.egothor.apps.Compile linkdb/db.txt linkdb/db.xxx
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
org.egothor.oracle.LinksFileReader linkdb/db.xxx ranks/
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
org.egothor.apps.Michelangelo
%JAVA_CMD% -Djava.util.logging.config.file=log.prop
org.egothor.apps.Oracul index/ -linksrank ranks/
cd ..
echo Done! (Step 4 of 5)
time /T
echo
echo Step 5: Swap to the new index
echo =============================
move www.inf.aber.ac.uk www.inf.aber.ac.uk-old
move www.inf.aber.ac.uk-build www.inf.aber.ac.uk
move www.inf.aber.ac.uk-old www.inf.aber.ac.uk-build
echo Done! (Step 5 of 5)
echo
INIT.bat looks like this:
@echo off
REM *********************************************************
REM (C) 2003 Leo Galambos
REM *** INITIAL CHECK ***
set EGOTHOR_HOME=.
REM *********************
set JAVA_HOME=C:\Program Files\Java\jdk1.5.0
set JAVA_CMD="%JAVA_HOME%\bin\java.exe"
if exist %JAVA_CMD% goto ok
echo Java executable is not found (%JAVA_CMD%), I will try java.exe.
echo If it does not help, please set the environment variable
echo JAVA_HOME to point to your Java SKD set the correct filename in
bin\INIT.bat
set JAVA_CMD="java.exe"
:ok
REM *** GET JAVA ARCHIVES WE NEED ***
for %%i in ("%EGOTHOR_HOME%\tmp\dist\*.jar") do call
"%EGOTHOR_HOME%\bin\win\lcp.bat" "%%i"
for %%i in ("%EGOTHOR_HOME%\lib\*.jar") do call
"%EGOTHOR_HOME%\bin\win\lcp.bat" "%%i"
set LOCALPC=%CLASSPATH%;%LCP%
set LCP=
REM echo using classpath %LOCALPC%
if not "%LOCALPC%"=="" goto finish
echo "Please, run 'ant jar' first or download and copy Egothor JAR files
to"
echo " %EGOTHOR_HOME%\tmp\dist directory"
exit 1
:finish
set JAVA_CMD=%JAVA_CMD% -classpath %LOCALPC%
REM *** SET THE REMAINING VARIABLES FOR GEEKS ***
REM comment out if you want to remove czech diacritics
set CSDIA=-csdia
REM comment out if you want to store semantics information for Seneca
REM (obsolete)
REM set SEM="-semantics sem.log"
REM comment out if you want to store data for snippet support in your
index
set SNIPPET=-snippet
REM comment out if you want to use a stemmer table as default
REM set STEMMER=-Ddefault.stem=../var/us_uk.comp
> Is there anyone who successfull use it with Windows and PHP? I want to
> implement a full-text search within my website.
Now for the PHP bit...
You still need to be running Tomcat. So we run tomcat on the same
machine, but behind the machine's firewall, so only the Apache web
server can talk to it. We then strip out all of the presentation code
from find.jsp, so that it returns just the results (eg. Remove the text
saying "Results:" etc.) You also need to edit the links
$q = $_GET['q'];
$q = urlencode(stripslashes($q));
$search_url = ("http://www.aber.ac.uk:8080/aber-search/find.jsp?q=" .
$q .
"&l=en&f=" .
$_GET['f'] .
"&s=" .
$_GET['s']);
$results = @file_get_contents($search_url, "r");
$results = preg_replace("/find.jsp\?f\=/", $_SERVER['PHP_SELF']."?f=",
$results);
This is all a bit of a hack really.
The best way to do it on Windows is to use the IIS->Tomcat connector
(ask google) as this will then handle the JSP's natively without the
overhead of PHP.
Hope this helps,
Stuart
_________________________________________________________________
Datblygydd Cymwysiadau'r We Web Applications Developer
Gwasanaethau Gwybodaeth Information Services
Prifysgol Cymru Aberystwyth University of Wales Aberystwyth
E-bost / E-mail: Stuart.Lewis at aber.ac.uk
Ffon / Tel: (01970) 622860
_________________________________________________________________
More information about the Egothor-tech
mailing list