Skip to content

Fix disease search ranking to surface exact-match canonical hits#188

Merged
imaurer merged 1 commit intomainfrom
fix-disease-exact-query-ranking
Mar 10, 2026
Merged

Fix disease search ranking to surface exact-match canonical hits#188
imaurer merged 1 commit intomainfrom
fix-disease-exact-query-ranking

Conversation

@imaurer
Copy link
Collaborator

@imaurer imaurer commented Mar 10, 2026

Summary

Exact disease-name queries (e.g. biomcp search disease "colorectal cancer" --limit 10) were returning subtype-heavy upstream results that buried the canonical disease node. The canonical exact match could fall outside the returned page entirely.

This PR adds an overscan-and-rerank pass to disease search:

  • Multiple resolver queries feed a wider candidate pool (exact name, normalized text, MONDO prefix)
  • Candidates are deduplicated by MONDO ID across all upstream result pages
  • Results are scored against all candidate labels (exact match → prefix → substring) and sorted so the canonical exact-match node surfaces first
  • The returned slice is then applied from the reranked pool, not raw upstream order

Changes

  • src/entities/disease.rs: rerank_disease_search_hits function + integration into the search path with overscan
  • spec/07-disease.md: spec assertion tightened to pin the canonical MONDO row (MONDO:0024331 | colorectal carcinoma)

Testing

  • cargo test — 570 passed, 0 failed
  • make spec — 95 passed, 5 skipped, 0 failed
  • Manual spot checks: colorectal cancer, breast cancer, lung cancer all surface the canonical node first

Exact user queries like "colorectal cancer" were being buried by
subtype-heavy upstream ordering. This adds an overscan-and-rerank pass
in disease search: multiple resolver queries feed a wider candidate pool,
deduplicated by MONDO ID, then scored by exact/prefix/substring match
against all candidate labels. The canonical exact-match node is now
guaranteed to surface within the returned page.

Adds regression unit test covering the colorectal cancer ranking case
and updates the spec assertion to pin the canonical MONDO row.
@imaurer imaurer merged commit e5ad03f into main Mar 10, 2026
1 check passed
@imaurer imaurer deleted the fix-disease-exact-query-ranking branch March 10, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant