Add log-odds conjunction fusion for BB25 hybrid search by jaepil · Pull Request #1041 · neuml/txtai

jaepil · 2026-02-17T10:23:18Z

Summary

Add log-odds conjunction fusion strategy for BB25 (Bayesian BM25) normalized hybrid search
Calibrate dense cosine scores via per-query dynamic sigmoid (beta=median, alpha=1/std) to produce logits centered at 0, then fuse with sparse BB25 logits using weighted mean log-odds with confidence scaling
Refactor hybrid fusion dispatch into separate logodds(), convex(), and rrf() methods
Expose isbayes() on scoring classes to select the appropriate fusion strategy

Motivation

BB25 normalization outputs calibrated probabilities in [0, 1]. The existing convex combination fusion (w * dense + (1-w) * sparse) treats these as raw weights, discarding the Bayesian probability semantics. Log-odds conjunction fuses evidence in logit space where independent probability signals combine additively -- the mathematically correct way to accumulate Bayesian evidence.

BEIR Benchmark Results (nDCG@10)

Dataset	Default+Convex	BB25+LogOdds	Delta
arguana	49.47%	51.70%	+2.23
fiqa	35.41%	37.44%	+2.03
nfcorpus	34.79%	32.83%	-1.96
scidocs	19.71%	20.33%	+0.62
scifact	71.08%	72.41%	+1.33

4/5 datasets improved. Average delta: +0.85 across all 5 datasets.

The nfcorpus regression is an inherent property of logit-space fusion on a corpus with very short queries (median 2 words), many relevant documents per query (38.2 avg), and graded relevance levels. The nonlinear logit transform reorders documents whose scores are very close, which slightly hurts fine-grained ordering among many high-scoring relevant documents.

For nfcorpus-like corpora, the reference BB25 implementation's parameter learning feature (BayesianProbabilityTransform.fit()) can recover the regression by fitting a stable global beta from relevance judgments, which prevents the per-query median from being thrown off by many near-identical BM25 scores from short queries. In our experiments, learned parameters brought nfcorpus from 32.83% back to 34.88%, surpassing the default baseline.

BB25 normalization outputs calibrated probabilities, but the existing hybrid fusion uses convex combination which discards the Bayesian probability semantics. This causes BB25 to regress on 4/5 BEIR datasets. Add log-odds conjunction fusion (from "From Bayesian Inference to Neural Computation") that correctly combines probability signals in logit space with per-query dynamic calibration for dense cosine scores. - scoring/normalize.py: Extract Bayesian method check into isbayes() - scoring/base.py: Add default isbayes() returning False - scoring/tfidf.py: Add isbayes() delegating to normalizer - search/base.py: Add logodds(), convex(), rrf() fusion methods; dispatch based on isbayes() BEIR nDCG@10 results (BB25+LogOdds vs Default): arguana +2.23, fiqa +2.03, scidocs +0.62, scifact +1.33, nfcorpus -1.96

davidmezzetti · 2026-02-17T12:36:04Z

Thank you for this and the detailed explanation.

If someone just enabled scoring only indexing and enabled bayes, do the the scores work? Or do they need the logic in logodds to work even in standalone? i.e.

embeddings = Embeddings(
  keyword=True,
  scoring={"method": "bm25", "terms": True, "normalize": "bb25"}
)

I like the new methods to combine scores. I think it would be good to split that up into a separate subclass to containerize it (something like hybrid and similar to what we did with scoring normalizer).

jaepil · 2026-02-18T02:23:42Z

Thanks for the review!

Thank you for this and the detailed explanation.

If someone just enabled scoring only indexing and enabled bayes, do the the scores work? Or do they need the logic in logodds to work even in standalone? i.e.
embeddings = Embeddings(
  keyword=True,
  scoring={"method": "bm25", "terms": True, "normalize": "bb25"}
)

Sparse-only BB25: Yes, it works standalone. When there's no ANN index, the hybrid flag is False and the fusion logic (logodds/convex/rrf) is never reached -- sparse results are returned directly with BB25 probabilities from the normalizer.

I like the new methods to combine scores. I think it would be good to split that up into a separate subclass to containerize it (something like hybrid and similar to what we did with scoring normalizer).

Subclass extraction: Good idea. I'll refactor the fusion methods into a separate Hybrid class (similar to Normalize) so Search delegates to it based on the scoring configuration.

Move logodds, convex, and rrf fusion methods from Search into a dedicated Hybrid class, following the same pattern as Normalize.

davidmezzetti · 2026-02-18T11:47:02Z

Running the tests now. Once complete, I'll merge and re-run the benchmarks and report back. Thank you for this!

davidmezzetti · 2026-02-18T12:23:47Z

@jaepil Just ran the build. It's failing due to the coding conventions failing. If you wanted to fix this, you'd just need to install pre-commit to see the issues: https://github.com/neuml/.github/blob/master/CONTRIBUTING.md#set-up-a-development-environment

If you don't have time for that, I can merge and fix after that.

jaepil · 2026-02-18T12:26:57Z

@jaepil Just ran the build. It's failing due to the coding conventions failing. If you wanted to fix this, you'd just need to install pre-commit to see the issues: https://github.com/neuml/.github/blob/master/CONTRIBUTING.md#set-up-a-development-environment

If you don't have time for that, I can merge and fix after that.

@davidmezzetti I'm going to fix it now.

- Fix black formatting: remove unnecessary parentheses, remove spaces around ** - Fix pylint too-many-branches: extract calibrate() method from logodds() - Fix pylint unused-variable: rename score to _ in rrf()

jaepil · 2026-02-18T12:39:21Z

@davidmezzetti Fixed the coding convention issues (black formatting + pylint warnings). The CI should pass now.

davidmezzetti · 2026-02-18T13:17:06Z

The other minor thing coding convention wise is the repo doesn't use "_" variable notation. But I can modify that after the merge too.

davidmezzetti · 2026-02-18T14:30:16Z

@jaepil Merged! I just ran the tests locally and they match. Thank you once again for adding this algorithm in!

jaepil · 2026-02-18T14:45:40Z

@davidmezzetti Thank you for the thorough review and for merging! I'll keep the _ variable naming convention in mind for future contributions.

davidmezzetti · 2026-02-18T19:20:17Z

I just added tests for this code and added a couple code standardizations. Task complete! Thanks again.

Extract Hybrid class for score fusion strategies

89cff93

Move logodds, convex, and rrf fusion methods from Search into a dedicated Hybrid class, following the same pattern as Normalize.

davidmezzetti assigned jaepil Feb 18, 2026

davidmezzetti added this to the v9.6.0 milestone Feb 18, 2026

Fix coding convention issues in Hybrid class for CI

bd1ccea

- Fix black formatting: remove unnecessary parentheses, remove spaces around ** - Fix pylint too-many-branches: extract calibrate() method from logodds() - Fix pylint unused-variable: rename score to _ in rrf()

davidmezzetti merged commit 48f0962 into neuml:master Feb 18, 2026
4 checks passed

davidmezzetti added a commit that referenced this pull request Feb 18, 2026

Code standardization and added tests #1041

00acd24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add log-odds conjunction fusion for BB25 hybrid search#1041

Add log-odds conjunction fusion for BB25 hybrid search#1041
davidmezzetti merged 3 commits intoneuml:masterfrom
jaepil:master

jaepil commented Feb 17, 2026

Uh oh!

davidmezzetti commented Feb 17, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jaepil commented Feb 17, 2026

Summary

Motivation

BEIR Benchmark Results (nDCG@10)

Uh oh!

davidmezzetti commented Feb 17, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

jaepil commented Feb 18, 2026

Uh oh!

davidmezzetti commented Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants