[Egothor-tech] Relevance value's scale

Leo Galambos leo.galambos at mff.cuni.cz
Fri Apr 7 11:46:29 BST 2006


Ivanov, I (Ivo) wrote:

> Hi All,
>
> What is the scale of the relevance value shown in the result? Is it 
> the number of occerances or some magic formulae is used?
>

Hello,

the formula is (see TermRunner and VectorRunner objects, normalization 
is hidden in FTField::invertize):

sim(q,d) = w*(1+idf)*boost*factor / rank(q)
where:
w=(tf/||tf||) * 10000

(tf) is a vector with term frequencies
(w) is a normalized tf-vector (length=10000)
(idf) is inverse document frequency (see CWI object), depends on the 
egothor version you are using: it could be the classic formula 
MINIDF+log(N/n)
(boost) is a constant you can set for each term via "^^" in queries
(factor) is a special constant: 1=for document terms, 10 or 100 for 
special tokens like <!VOLATILE> or <SRC> or <VALUE>
(rank(q)) is the number of not-0 values in the vector q

Hope this helps

L.G.

-- 
Leo Galambos
Faculty of Mathematics and Physics, DSE
Malostranske namesti 25
Prague 1
CZE

http://kocour.ms.mff.cuni.cz/~galambos/




More information about the Egothor-tech mailing list