Skip to content

Comments

fix: support non-float vectors in struct array in search#3277

Open
SpadeA-Tang wants to merge 1 commit intomilvus-io:masterfrom
SpadeA-Tang:fix-search-non-float
Open

fix: support non-float vectors in struct array in search#3277
SpadeA-Tang wants to merge 1 commit intomilvus-io:masterfrom
SpadeA-Tang:fix-search-non-float

Conversation

@SpadeA-Tang
Copy link
Contributor

issue: #3269

Signed-off-by: SpadeA <tangchenjie1210@gmail.com>
@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SpadeA-Tang
To complete the pull request process, please assign tedxu after the PR has been reviewed.
You can assign the PR to them by writing /assign @tedxu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@codecov
Copy link

codecov bot commented Feb 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.57%. Comparing base (eb7868f) to head (112242a).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3277      +/-   ##
==========================================
+ Coverage   76.36%   76.57%   +0.21%     
==========================================
  Files          63       63              
  Lines       13321    13330       +9     
==========================================
+ Hits        10173    10208      +35     
+ Misses       3148     3122      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mergify mergify bot added the ci-passed label Feb 11, 2026
@zhuwenxing
Copy link
Contributor

Verified against Milvus master-20260210-f354d39a. EmbeddingList + MAX_SIM search path works correctly for float16/int8/binary vectors. ✅

However, the single-vector element_filter search path still has issues for float16/bfloat16/int8. Raw bytes input in _prepare_placeholder_str is always tagged as BinaryVector (line 1348), causing type mismatch on the server side.

pymilvus.exceptions.MilvusException: <MilvusException: (code=65535, message=fail to search on QueryNode 1: 
worker(1) query failed: parser searchRequest failed: Assert "check_data_type(field_meta, ph)" => 
vector type must be the same, field items[embedding] - type VECTOR_ARRAY, search ph type VECTOR_BINARY 
at /workspace/source/internal/core/src/query/Plan.cpp:125)>
Reproduction script
import numpy as np
from pymilvus import MilvusClient, DataType

URI = "http://localhost:19530"
DIM = 32
NB = 100
client = MilvusClient(uri=URI)

col = "repro_ef_float16"
if col in client.list_collections():
    client.drop_collection(col)

schema = client.create_schema(auto_id=False)
schema.add_field("id", DataType.INT64, is_primary=True)
schema.add_field("top_vec", DataType.FLOAT_VECTOR, dim=DIM)

struct_schema = client.create_struct_field_schema()
struct_schema.add_field("embedding", DataType.FLOAT16_VECTOR, dim=DIM)
struct_schema.add_field("val", DataType.INT64)
schema.add_field("items", DataType.ARRAY, element_type=DataType.STRUCT,
                 struct_schema=struct_schema, max_capacity=5)

index_params = client.prepare_index_params()
index_params.add_index("top_vec", index_type="HNSW", metric_type="COSINE",
                       params={"M": 16, "efConstruction": 200})
index_params.add_index("items[embedding]", index_type="HNSW",
                       metric_type="MAX_SIM_L2",
                       params={"M": 16, "efConstruction": 200})
client.create_collection(collection_name=col, schema=schema, index_params=index_params)

data = []
for i in range(NB):
    elems = [{"embedding": np.random.rand(DIM).astype(np.float16).tobytes(),
              "val": i * 10 + j} for j in range(3)]
    data.append({"id": i, "top_vec": np.random.rand(DIM).astype(np.float32).tolist(),
                 "items": elems})
client.insert(col, data)
client.flush(col)

query_vec = data[0]["items"][0]["embedding"]  # raw bytes
client.search(
    collection_name=col, data=[query_vec],
    anns_field="items[embedding]",
    search_params={"metric_type": "L2"},
    filter="element_filter(items, $[val] >= 0)", limit=5,
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants