[Egothor-tech] Relevance value's scale
Leo Galambos
leo.galambos at mff.cuni.cz
Fri Apr 7 11:46:29 BST 2006
Ivanov, I (Ivo) wrote:
> Hi All,
>
> What is the scale of the relevance value shown in the result? Is it
> the number of occerances or some magic formulae is used?
>
Hello,
the formula is (see TermRunner and VectorRunner objects, normalization
is hidden in FTField::invertize):
sim(q,d) = w*(1+idf)*boost*factor / rank(q)
where:
w=(tf/||tf||) * 10000
(tf) is a vector with term frequencies
(w) is a normalized tf-vector (length=10000)
(idf) is inverse document frequency (see CWI object), depends on the
egothor version you are using: it could be the classic formula
MINIDF+log(N/n)
(boost) is a constant you can set for each term via "^^" in queries
(factor) is a special constant: 1=for document terms, 10 or 100 for
special tokens like <!VOLATILE> or <SRC> or <VALUE>
(rank(q)) is the number of not-0 values in the vector q
Hope this helps
L.G.
--
Leo Galambos
Faculty of Mathematics and Physics, DSE
Malostranske namesti 25
Prague 1
CZE
http://kocour.ms.mff.cuni.cz/~galambos/
More information about the Egothor-tech
mailing list