IDF in BM25Okapi is not version from Atire

In Trotman, Jia & Crane the idf measure is given as log(N/df_t) (see top of page 5), where N is the corpus size and df_t is the number of docs containing term t. This is always non-negative. In the implementation of BM25Okapi, you have used an earlier version of the idf computation (Robertson-Spark Jones, see eq (4) bottom of page 2 in Trotman, Puurula & Burgess), which can become negative and which you handle by setting negative values to eps.

The balance of the score calculation in BM25Okapi follows Atire exactly, and the implementations for BM25L and BM25+ align perfectly with the descriptions in Trotman, Puurula & Burgess. I wonder why the idf implementation for BM25Okapi deviates from the Atire specification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDF in BM25Okapi is not version from Atire #35

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

IDF in BM25Okapi is not version from Atire #35

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions