Skip to content

FormulaQuery with multiple prefetches does not surface all filter matches #1072

@MotzWanted

Description

@MotzWanted

Qdrant version: 1.15.1
Client: qdrant-client Python, AsyncQdrantClient
Data: one vector per point, payload fields e.g., collections: [..], tags:[..], etc. is indexed, distance cosine

Summary

I am trying to boost points that match the query_filter (example: collections == "OEO") on top of a hybrid search. Prefetch limit is defined to support infinity scroll, so that it increase dynamically as the user scroll down in frontend. With a single prefetch everything works as expected and all 16 matching points get boosted and appear at the top. As soon as I add more prefetches, only 6 of the 16 matching points appear (I've tried with larger prefetch limit, Fusion.RRF and Fusion.DBSF, which makes no difference), even though:

  • count(filter=...) returns 16
  • a direct query_points with query_filter returns those 16
    This looks like a candidate-set construction issue when multiple prefetches are present.

What I am doing

Goal: keep the base vector score, then boost score if the filter matches.

# Build the payload filter
query_filter = qdrm.Filter(
    must=[qdrm.FieldCondition(key="collections", match=qdrm.MatchAny(any=["OEO"]))]
)

# Ground truth
filtered_count = (await qdrant.count(
    collection_name=idx, count_filter=query_filter, exact=True
)).count  # -> 16

# Prefetch 1: filtered vector search - intended as the base stream
prefetch_filtered_vec = qdrm.Prefetch(
    query=vectors,
    filter=query_filter,
    limit=page_size,
)

# Prefetch 2: extra signal (can be a recommend, discover or unfiltered vector)
# Shown minimal here to reproduce the issue
prefetch_recommend = qdrm.Prefetch(
    query=qdrm.RecommendQuery(
        recommend=qdrm.RecommendInput(
            positive=pos_points,
            negative=neg_points,
            strategy=strategy,
        )
    ), 
    limit=page_size,
)

# Formula: base score from the first prefetch, plus constant boost on filter match
boost = 5.0
results = await qdrant.query_points(
    collection_name=idx,
    prefetch=[prefetch_filtered_vec, prefetch_recommend, prefetch_discover],
    query=qdrm.FormulaQuery(
        formula=qdrm.SumExpression(sum=[
            "$score", 
            qdrm.MultExpression(mult=[boost, query_filter]) 
        ])
    ),
    limit=page_size,
    offset=(page - 1) * page_size,
    with_payload=True,
)

Expected

  • All 16 points that match collections == "OEO" should be in the result set and appear above non-matching items due to the constant boost, since they are in prefetch_filtered_vec.
  • Scores for those 16 should be roughly base + 5.

Actual

  • Only 6 of the 16 matching points appear boosted in the final results.
  • The remaining top rows are other items from prefetch_recommend or prefetch_discover.
  • If I remove prefetch_recommend and prefetch_discover, all 16 appear and are boosted as expected.

Observations

  • Order of prefetches: I put the filtered vector prefetch first so $score refers to it. The behavior persists.
  • Limits: I set prefetch_filtered_vec.limit >= filtered_count and also tried higher values. Behavior persists.
  • Using FusionQuery for multiple vectors produces the same symptom.
  • Using Sum rather than Mult in the formula avoids negative score issues but does not change the missing 10 candidates.

Questions

  • How is the candidate pool built when multiple prefetches are present, including a filtered vector prefetch?
  • Is the union clipped by a global internal cap before the formula runs?
  • Do earlier prefetches dominate the pool in a way that can starve later ones even if their limits are higher?
  • Are there any additional knobs to ensure that all points from a filter match enter the candidate pool when other prefetches are also present - for example a global candidate cap, or a way to merge prefetches where one is guaranteed not to be clipped by others?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions