-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: v2.6.10
- Deployment mode(standalone or cluster): Standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): PyMilvus v2.6.10
- OS(Ubuntu or CentOS): Windows (Client) / Linux (Server)
- CPU/Memory: N/A
- GPU: N/A
- Others: Docker Compose deploymentCurrent Behavior
Milvus v2.6.10 accepts integers inserted into string fields through dynamic field functionality, violating data type consistency requirements documented in official documentation.
Key Issue: When a dynamic field is first established as a string type (VARCHAR), subsequent insertions with integer values should be rejected but are instead accepted, causing data type inconsistency within the same field.
Expected Behavior
Milvus should validate data types when inserting into dynamic fields. According to the official documentation, entities within the same collection should have the same data types.
Once a field is established with a specific type (VARCHAR for strings), subsequent insertions with incompatible types (INT64 for integers) should be rejected.
Example expected error:
<ParamError: (code=1100, message=Invalid data type for field 'text_field': expected VARCHAR, got INT64)>
Steps To Reproduce
1. Create a collection with dynamic field enabled (default behavior)
2. Insert first record with a string value (e.g., `"text_field": "hello"`)
3. Insert second record with an integer value for the same field (e.g., `"text_field": 123`)
4. Observe that the second insertion succeeds instead of failing with a data type validation error
5. Query the data to see type inconsistency
**Complete Reproduction Script:**
from pymilvus import MilvusClient
client = MilvusClient("http://localhost:19530")
# Create collection with dynamic field enabled
collection_name = "test_data_type_string"
if client.has_collection(collection_name):
client.drop_collection(collection_name)
client.create_collection(
collection_name=collection_name,
dimension=128,
metric_type="L2"
)
# Step 1: First insert with STRING value - establishes field as VARCHAR type
print("Step 1: Insert with string value (establishes text_field as VARCHAR)")
data1 = [
{"vector": [0.1] * 128, "id": 1, "text_field": "hello_world"}
]
client.insert(collection_name=collection_name, data=data1)
print("Result: SUCCESS - text_field is now VARCHAR type")
# Step 2: Second insert with INTEGER value - SHOULD BE REJECTED
print("\nStep 2: Insert with integer value (should be rejected - type mismatch)")
data2 = [
{"vector": [0.2] * 128, "id": 2, "text_field": 12345} # Integer into VARCHAR field
]
try:
client.insert(collection_name=collection_name, data=data2)
print("Result: ACCEPTED - BUG CONFIRMED!")
print("Expected: REJECTED (data type mismatch)")
except Exception as e:
print(f"Result: REJECTED - {e}")
print("This is the expected behavior")
# Step 3: Query to verify data type inconsistency
print("\nStep 3: Query results showing type inconsistency")
results = client.query(
collection_name=collection_name,
filter="id in [1, 2]",
output_fields=["id", "text_field"]
)
for r in results:
val = r.get("text_field")
print(f" id={r.get('id')}, text_field={val}, type={type(val).__name__}")
# Clean up
if client.has_collection(collection_name):
client.drop_collection(collection_name)
**Test Results:**
Step 1: Insert with string value (establishes text_field as VARCHAR)
Result: SUCCESS - text_field is now VARCHAR type
Step 2: Insert with integer value (should be rejected - type mismatch)
Result: ACCEPTED - BUG CONFIRMED!
Expected: REJECTED (data type mismatch)
Step 3: Query results showing type inconsistency
id=1, text_field=hello_world, type=str
id=2, text_field=12345, type=int <-- Same field, different type!Milvus Log
N/A - Issue is reproducible via SDK behavior observation.
Anything else?
Documentation Evidence
According to the official Milvus Insert Entities Documentation:
Entity Schema Consistency:
"In Milvus, an Entity refers to data records in a Collection that share the same Schema, with the data in each field of a row constituting an Entity. Therefore, the Entities within the same Collection have the same attributes (such as field names, data types, and other constraints)."
Source URL: https://milvus.io/docs/insert-update-delete.md
Quote:
"Therefore, the Entities within the same Collection have the same attributes (such as field names, data types, and other constraints)."
According to the official Milvus Schema Design Documentation:
Schema Design:
"For schema design, Milvus supports flexible schema design, where you can define the fields and their data types, including vector fields."
Source URL: https://milvus.io/docs/manage-collections.md
According to the official Milvus Scalar Field Types Documentation:
Scalar Field Types:
"Milvus supports multiple scalar field types, including VarChar, Boolean, Int, Float, and Double."
"In Milvus, you can use VarChar fields to store strings."
Source URL: https://milvus.io/docs/schema.md
Evidence Analysis:
- Schema Consistency Requirement: Documentation explicitly states that entities within the same collection should have the same data types
- Data Type Definition: Fields must have defined data types (VarChar for strings, Int for integers, etc.)
- Type Distinction: VarChar and Int are distinct types that should not be interchangeable
- Validation Expectation: The documentation implies data type validation should occur during insertion
Impact
- Data Type Inconsistency: The same field may contain both strings and integers
- Application Code Confusion: Code cannot reliably handle data types
- Query/Filter Unpredictability: Filtering on mixed-type fields may produce unexpected results
- Schema Integrity Violation: Violates documented requirement for consistent data types
Severity
P2 (Medium) - This is a data integrity issue that affects schema consistency and application reliability.
Root Cause
The dynamic field functionality ($meta field) stores additional fields as JSON key-value pairs without validating data type consistency. When a field is first created through dynamic insertion, Milvus infers its type from the first value, but does not enforce type consistency for subsequent insertions.
Suggested Fix
- Track Field Types: Maintain a type registry for dynamically created fields
- Validate on Insert: Check that inserted values match the established field type
- Clear Error Messages: Provide specific error messages when type mismatch occurs
Example Validation Logic:
# When inserting into dynamic field
if field_name in dynamic_field_types:
expected_type = dynamic_field_types[field_name]
actual_type = infer_type(value)
if expected_type != actual_type:
raise ParamError(
f"Invalid data type for field '{field_name}': "
f"expected {expected_type}, got {actual_type}"
)