Skip to content

Commit fb02e77

Browse files
committed
fix(hybrid-query): use ADDSCORES to enable proper BM25 text scoring
Previously, HybridQuery hardcoded text_score to 1.0, making the hybrid formula effectively ignore text relevance. This was an incomplete port from the Python RedisVL implementation. Changes: - Add aggregation.addScores() to enable the @__score field in FT.AGGREGATE - Use @__score (the actual BM25/text search score) instead of hardcoded 1.0 - Update Javadoc to document Redis 7.4.0+ requirement for ADDSCORES - Update formula documentation to clarify the scoring mechanism The hybrid scoring formula now correctly applies: hybrid_score = (1 - alpha) * text_score + alpha * vector_similarity Where text_score is the actual BM25 score from Redis, not a constant. Requires: Redis 7.4.0+ Reference: Python redisvl/query/aggregate.py lines 168-173
1 parent 5d1befa commit fb02e77

File tree

1 file changed

+25
-14
lines changed

1 file changed

+25
-14
lines changed

core/src/main/java/com/redis/vl/query/HybridQuery.java

Lines changed: 25 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,21 @@
1414
/**
1515
* HybridQuery combines text and vector search in Redis using aggregation.
1616
*
17-
* <p>Ported from Python: redisvl/query/aggregate.py:23-230 (HybridQuery class)
17+
* <p>Ported from Python: redisvl/query/aggregate.py:57-329 (AggregateHybridQuery class)
1818
*
1919
* <p>It allows you to perform a hybrid search using both text and vector similarity. It scores
20-
* documents based on a weighted combination of text and vector similarity.
20+
* documents based on a weighted combination of text and vector similarity using the formula:
21+
*
22+
* <pre>
23+
* hybrid_score = (1 - alpha) * text_score + alpha * vector_similarity
24+
* </pre>
25+
*
26+
* <p>Where {@code text_score} is the BM25 score from the text search and {@code vector_similarity}
27+
* is the normalized cosine similarity from the vector search.
28+
*
29+
* <p><strong>Redis Version Requirements:</strong> This query uses the ADDSCORES option in
30+
* FT.AGGREGATE to expose the internal text search score (@__score). This feature requires
31+
* <strong>Redis 7.4.0 or later</strong>. On older Redis versions, the query will fail.
2132
*
2233
* <p><strong>Note on Runtime Parameters:</strong> HybridQuery uses Redis FT.AGGREGATE for
2334
* aggregation-based hybrid search. As of Redis Stack 7.2+, runtime parameters (efRuntime, epsilon,
@@ -598,30 +609,30 @@ public AggregationBuilder buildRedisAggregation() {
598609
// Set dialect
599610
aggregation.dialect(dialect);
600611

601-
// Set text scorer (Python: self.scorer(text_scorer))
602-
// Note: In Jedis, we need to use WITHSCORE to get the text score
603-
// For now, we'll use vector similarity only and calculate text score differently
612+
// Enable ADDSCORES to expose @__score field containing the text search score
613+
// (Python: self.add_scores() - line 169)
614+
// Note: Requires Redis 7.4.0+. Uses default BM25 scorer.
615+
aggregation.addScores();
604616

605-
// Apply vector similarity calculation (Python: line 122-123)
617+
// Apply vector similarity calculation (Python: line 170-172)
606618
// vector_similarity = (2 - @vector_distance) / 2
619+
// Normalizes cosine distance [0,2] to similarity [0,1]
607620
aggregation.apply("(2 - @" + DISTANCE_ID + ")/2", "vector_similarity");
608621

609-
// Apply text score - for hybrid queries, the text matching score is implicit
610-
// Since we can't easily access __score in aggregations, we'll use a constant of 1.0
611-
// This means the hybrid score will be based primarily on vector similarity
612-
// TODO: Investigate using WITHSCORE or custom scoring
613-
aggregation.apply("1.0", "text_score");
622+
// Apply text score from @__score (the BM25/text search score exposed by ADDSCORES)
623+
// (Python: text_score="@__score" - line 171)
624+
aggregation.apply("@__score", "text_score");
614625

615-
// Apply hybrid score calculation (Python: line 125)
626+
// Apply hybrid score calculation (Python: line 173)
616627
// hybrid_score = (1-alpha) * text_score + alpha * vector_similarity
617628
String hybridScoreFormula =
618629
String.format("%f*@text_score + %f*@vector_similarity", (1 - alpha), alpha);
619630
aggregation.apply(hybridScoreFormula, "hybrid_score");
620631

621-
// Sort by hybrid score descending (Python: line 126)
632+
// Sort by hybrid score descending (Python: line 174)
622633
aggregation.sortBy(numResults, SortedField.desc("@hybrid_score"));
623634

624-
// Load return fields (Python: line 129)
635+
// Load return fields (Python: line 176-177)
625636
if (!returnFields.isEmpty()) {
626637
aggregation.load(returnFields.toArray(String[]::new));
627638
}

0 commit comments

Comments
 (0)