Skip to content

Add log-odds conjunction fusion for BB25 hybrid search#1041

Merged
davidmezzetti merged 3 commits intoneuml:masterfrom
jaepil:master
Feb 18, 2026
Merged

Add log-odds conjunction fusion for BB25 hybrid search#1041
davidmezzetti merged 3 commits intoneuml:masterfrom
jaepil:master

Conversation

@jaepil
Copy link
Contributor

@jaepil jaepil commented Feb 17, 2026

Summary

  • Add log-odds conjunction fusion strategy for BB25 (Bayesian BM25) normalized hybrid search
  • Calibrate dense cosine scores via per-query dynamic sigmoid (beta=median, alpha=1/std) to produce logits centered at 0, then fuse with sparse BB25 logits using weighted mean log-odds with confidence scaling
  • Refactor hybrid fusion dispatch into separate logodds(), convex(), and rrf() methods
  • Expose isbayes() on scoring classes to select the appropriate fusion strategy

Motivation

BB25 normalization outputs calibrated probabilities in [0, 1]. The existing convex combination fusion (w * dense + (1-w) * sparse) treats these as raw weights, discarding the Bayesian probability semantics. Log-odds conjunction fuses evidence in logit space where independent probability signals combine additively -- the mathematically correct way to accumulate Bayesian evidence.

BEIR Benchmark Results (nDCG@10)

Dataset Default+Convex BB25+LogOdds Delta
arguana 49.47% 51.70% +2.23
fiqa 35.41% 37.44% +2.03
nfcorpus 34.79% 32.83% -1.96
scidocs 19.71% 20.33% +0.62
scifact 71.08% 72.41% +1.33

4/5 datasets improved. Average delta: +0.85 across all 5 datasets.

The nfcorpus regression is an inherent property of logit-space fusion on a corpus with very short queries (median 2 words), many relevant documents per query (38.2 avg), and graded relevance levels. The nonlinear logit transform reorders documents whose scores are very close, which slightly hurts fine-grained ordering among many high-scoring relevant documents.

For nfcorpus-like corpora, the reference BB25 implementation's parameter learning feature (BayesianProbabilityTransform.fit()) can recover the regression by fitting a stable global beta from relevance judgments, which prevents the per-query median from being thrown off by many near-identical BM25 scores from short queries. In our experiments, learned parameters brought nfcorpus from 32.83% back to 34.88%, surpassing the default baseline.

BB25 normalization outputs calibrated probabilities, but the existing
hybrid fusion uses convex combination which discards the Bayesian
probability semantics. This causes BB25 to regress on 4/5 BEIR datasets.

Add log-odds conjunction fusion (from "From Bayesian Inference to Neural
Computation") that correctly combines probability signals in logit space
with per-query dynamic calibration for dense cosine scores.

- scoring/normalize.py: Extract Bayesian method check into isbayes()
- scoring/base.py: Add default isbayes() returning False
- scoring/tfidf.py: Add isbayes() delegating to normalizer
- search/base.py: Add logodds(), convex(), rrf() fusion methods;
  dispatch based on isbayes()

BEIR nDCG@10 results (BB25+LogOdds vs Default):
  arguana +2.23, fiqa +2.03, scidocs +0.62, scifact +1.33, nfcorpus -1.96
@davidmezzetti
Copy link
Member

Thank you for this and the detailed explanation.

If someone just enabled scoring only indexing and enabled bayes, do the the scores work? Or do they need the logic in logodds to work even in standalone? i.e.

embeddings = Embeddings(
  keyword=True,
  scoring={"method": "bm25", "terms": True, "normalize": "bb25"}
)

I like the new methods to combine scores. I think it would be good to split that up into a separate subclass to containerize it (something like hybrid and similar to what we did with scoring normalizer).

@jaepil
Copy link
Contributor Author

jaepil commented Feb 18, 2026

Thanks for the review!

Thank you for this and the detailed explanation.

If someone just enabled scoring only indexing and enabled bayes, do the the scores work? Or do they need the logic in logodds to work even in standalone? i.e.

embeddings = Embeddings(
  keyword=True,
  scoring={"method": "bm25", "terms": True, "normalize": "bb25"}
)

Sparse-only BB25: Yes, it works standalone. When there's no ANN index, the hybrid flag is False and the fusion logic (logodds/convex/rrf) is never reached -- sparse results are returned directly with BB25 probabilities from the normalizer.

I like the new methods to combine scores. I think it would be good to split that up into a separate subclass to containerize it (something like hybrid and similar to what we did with scoring normalizer).

Subclass extraction: Good idea. I'll refactor the fusion methods into a separate Hybrid class (similar to Normalize) so Search delegates to it based on the scoring configuration.

Move logodds, convex, and rrf fusion methods from Search into
a dedicated Hybrid class, following the same pattern as Normalize.
@davidmezzetti davidmezzetti added this to the v9.6.0 milestone Feb 18, 2026
@davidmezzetti
Copy link
Member

Running the tests now. Once complete, I'll merge and re-run the benchmarks and report back. Thank you for this!

@davidmezzetti
Copy link
Member

@jaepil Just ran the build. It's failing due to the coding conventions failing. If you wanted to fix this, you'd just need to install pre-commit to see the issues: https://github.com/neuml/.github/blob/master/CONTRIBUTING.md#set-up-a-development-environment

If you don't have time for that, I can merge and fix after that.

@jaepil
Copy link
Contributor Author

jaepil commented Feb 18, 2026

@jaepil Just ran the build. It's failing due to the coding conventions failing. If you wanted to fix this, you'd just need to install pre-commit to see the issues: https://github.com/neuml/.github/blob/master/CONTRIBUTING.md#set-up-a-development-environment

If you don't have time for that, I can merge and fix after that.

@davidmezzetti I'm going to fix it now.

- Fix black formatting: remove unnecessary parentheses, remove spaces around **
- Fix pylint too-many-branches: extract calibrate() method from logodds()
- Fix pylint unused-variable: rename score to _ in rrf()
@jaepil
Copy link
Contributor Author

jaepil commented Feb 18, 2026

@davidmezzetti Fixed the coding convention issues (black formatting + pylint warnings). The CI should pass now.

@davidmezzetti
Copy link
Member

The other minor thing coding convention wise is the repo doesn't use "_" variable notation. But I can modify that after the merge too.

@davidmezzetti davidmezzetti merged commit 48f0962 into neuml:master Feb 18, 2026
4 checks passed
@davidmezzetti
Copy link
Member

@jaepil Merged! I just ran the tests locally and they match. Thank you once again for adding this algorithm in!

@jaepil
Copy link
Contributor Author

jaepil commented Feb 18, 2026

@davidmezzetti Thank you for the thorough review and for merging! I'll keep the _ variable naming convention in mind for future contributions.

davidmezzetti added a commit that referenced this pull request Feb 18, 2026
@davidmezzetti
Copy link
Member

I just added tests for this code and added a couple code standardizations. Task complete! Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants