Skip to content

[Bug]: MinHash (MHJACCARD) missing explicit validation for range search #47747

@zhuwenxing

Description

@zhuwenxing

Is there an existing issue for this?

  • I have searched the existing issues

Environment

  • Milvus version: master
  • Deployment mode: standalone
  • MQ type: rocksmq
  • SDK version: pymilvus 2.7.0rc136
  • OS: Linux (K8s deployment)

Current Behavior

When performing a range search on a MinHash (MHJACCARD) collection with radius and range_filter parameters, the server returns a late-stage error:

MilvusException: (code=65535, message=fail to search on QueryNode 1: worker(1) query failed:
minhash not support range search)

The error is thrown deep in the execution layer (SearchBruteForce.cpp:220) rather than being caught early at the proxy validation layer. This means the request goes through query planning, routing, and execution before failing.

Expected Behavior

Range search with MHJACCARD metric type should be rejected early at the proxy layer with a clear error message, similar to how other unsupported operations are validated. For example, group_by on binary vectors is validated early at proxy/task_search.go:821-822.

Steps To Reproduce

from pymilvus import MilvusClient, DataType, Function, FunctionType

client = MilvusClient(uri="http://<host>:19530")

schema = client.create_schema(enable_dynamic_field=False)
schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False)
schema.add_field("text", DataType.VARCHAR, max_length=65535)
schema.add_field("minhash_sig", DataType.BINARY_VECTOR, dim=512)

schema.add_function(Function(
    name="text_to_minhash",
    function_type=FunctionType.MINHASH,
    input_field_names=["text"],
    output_field_names=["minhash_sig"],
    params={"num_hashes": 16, "shingle_size": 3},
))

index_params = client.prepare_index_params()
index_params.add_index(
    field_name="minhash_sig",
    index_type="MINHASH_LSH",
    metric_type="MHJACCARD",
    params={"mh_lsh_band": 8},
)
client.create_collection("test_range", schema=schema, index_params=index_params)

# Insert data, flush, load...

results = client.search(
    "test_range",
    data=["some query text"],
    anns_field="minhash_sig",
    search_params={
        "metric_type": "MHJACCARD",
        "params": {"radius": 0.8, "range_filter": 0.0}
    },
    limit=10,
    output_fields=["id"],
)
# Error: minhash not support range search

Anything else?

Root cause analysis:

The error originates at internal/core/src/query/SearchBruteForce.cpp:220 where minhash not support range search is thrown during execution. However, there is no early validation at the proxy layer to reject range search for MHJACCARD metric type.

By comparison, other unsupported operations have early proxy-level validation:

  • group_by on binary vectors: validated at proxy/task_search.go:821-822
  • VECTOR_ARRAY range search: has AssertInfo check

The fix should add early validation in the proxy layer (e.g., in task_search.go) to reject range search requests when metric_type == MHJACCARD, returning a clear error before the request reaches QueryNode execution.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions