-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master
- Deployment mode: standalone
- MQ type: rocksmq
- SDK version: pymilvus 2.7.0rc136
- OS: Linux (K8s deployment)
Current Behavior
When creating a MINHASH_LSH index, the mh_lsh_band parameter accepts clearly invalid values without any error or warning:
mh_lsh_band=0: Server accepts it, index is created, and search returns results (likely using a fallback/default)mh_lsh_band=-1: Server accepts it, index is created, and search returns resultsmh_lsh_band > num_hashes(e.g., band=26 with num_hashes=16): Server accepts it, index is created, and search returns results
All three cases should be rejected with clear validation errors.
Expected Behavior
The server should validate mh_lsh_band at index creation time:
mh_lsh_bandmust be a positive integer (> 0)mh_lsh_bandmust not exceednum_hashes(from the MinHash function params)mh_lsh_bandshould ideally be a divisor ofnum_hashesfor optimal LSH behavior
Invalid values should be rejected with a clear error message at index creation time.
Steps To Reproduce
from pymilvus import MilvusClient, DataType, Function, FunctionType
client = MilvusClient(uri="http://<host>:19530")
schema = client.create_schema(enable_dynamic_field=False)
schema.add_field("id", DataType.INT64, is_primary=True, auto_id=False)
schema.add_field("text", DataType.VARCHAR, max_length=65535)
schema.add_field("minhash_sig", DataType.BINARY_VECTOR, dim=512)
schema.add_function(Function(
name="text_to_minhash",
function_type=FunctionType.MINHASH,
input_field_names=["text"],
output_field_names=["minhash_sig"],
params={"num_hashes": 16, "shingle_size": 3},
))
# Case 1: mh_lsh_band = 0
index_params = client.prepare_index_params()
index_params.add_index(
field_name="minhash_sig",
index_type="MINHASH_LSH",
metric_type="MHJACCARD",
params={"mh_lsh_band": 0}, # Should be rejected
)
client.create_collection("test_band_0", schema=schema, index_params=index_params)
# No error! Index created successfully.
# Case 2: mh_lsh_band = -1
index_params2 = client.prepare_index_params()
index_params2.add_index(
field_name="minhash_sig",
index_type="MINHASH_LSH",
metric_type="MHJACCARD",
params={"mh_lsh_band": -1}, # Should be rejected
)
client.create_collection("test_band_neg", schema=schema, index_params=index_params2)
# No error! Index created successfully.
# Case 3: mh_lsh_band = 26 > num_hashes = 16
index_params3 = client.prepare_index_params()
index_params3.add_index(
field_name="minhash_sig",
index_type="MINHASH_LSH",
metric_type="MHJACCARD",
params={"mh_lsh_band": 26}, # Should be rejected (> num_hashes=16)
)
client.create_collection("test_band_exceed", schema=schema, index_params=index_params3)
# No error! Index created successfully.Anything else?
Root cause analysis:
There is no validation for mh_lsh_band at any layer:
- Proxy layer: No parameter range check when processing CreateIndex request
- Index node: No validation when building the MINHASH_LSH index
- Knowhere: The underlying index library silently accepts invalid band values
The fix should add validation at the proxy layer (index creation path) or in the Knowhere MINHASH_LSH index config to ensure:
mh_lsh_band > 0mh_lsh_band <= num_hashes(requires cross-referencing the collection schema's MinHash function params)- Optionally warn if
num_hashes % mh_lsh_band != 0(non-divisor bands may cause suboptimal LSH behavior)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels