Skip to content

[Bug]: Data type validation missing - accepts integer into string field #47766

@yihui504

Description

@yihui504

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.6.10
- Deployment mode(standalone or cluster): Standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): PyMilvus v2.6.10
- OS(Ubuntu or CentOS): Windows (Client) / Linux (Server)
- CPU/Memory: N/A
- GPU: N/A
- Others: Docker Compose deployment

Current Behavior

Milvus v2.6.10 accepts integers inserted into string fields through dynamic field functionality, violating data type consistency requirements documented in official documentation.

Key Issue: When a dynamic field is first established as a string type (VARCHAR), subsequent insertions with integer values should be rejected but are instead accepted, causing data type inconsistency within the same field.

Expected Behavior

Milvus should validate data types when inserting into dynamic fields. According to the official documentation, entities within the same collection should have the same data types.

Once a field is established with a specific type (VARCHAR for strings), subsequent insertions with incompatible types (INT64 for integers) should be rejected.

Example expected error:

<ParamError: (code=1100, message=Invalid data type for field 'text_field': expected VARCHAR, got INT64)>

Steps To Reproduce

1. Create a collection with dynamic field enabled (default behavior)
2. Insert first record with a string value (e.g., `"text_field": "hello"`)
3. Insert second record with an integer value for the same field (e.g., `"text_field": 123`)
4. Observe that the second insertion succeeds instead of failing with a data type validation error
5. Query the data to see type inconsistency
**Complete Reproduction Script:**

from pymilvus import MilvusClient

client = MilvusClient("http://localhost:19530")

# Create collection with dynamic field enabled
collection_name = "test_data_type_string"
if client.has_collection(collection_name):
    client.drop_collection(collection_name)
client.create_collection(
    collection_name=collection_name,
    dimension=128,
    metric_type="L2"
)

# Step 1: First insert with STRING value - establishes field as VARCHAR type
print("Step 1: Insert with string value (establishes text_field as VARCHAR)")
data1 = [
    {"vector": [0.1] * 128, "id": 1, "text_field": "hello_world"}
]
client.insert(collection_name=collection_name, data=data1)
print("Result: SUCCESS - text_field is now VARCHAR type")

# Step 2: Second insert with INTEGER value - SHOULD BE REJECTED
print("\nStep 2: Insert with integer value (should be rejected - type mismatch)")
data2 = [
    {"vector": [0.2] * 128, "id": 2, "text_field": 12345}  # Integer into VARCHAR field
]
try:
    client.insert(collection_name=collection_name, data=data2)
    print("Result: ACCEPTED - BUG CONFIRMED!")
    print("Expected: REJECTED (data type mismatch)")
except Exception as e:
    print(f"Result: REJECTED - {e}")
    print("This is the expected behavior")

# Step 3: Query to verify data type inconsistency
print("\nStep 3: Query results showing type inconsistency")
results = client.query(
    collection_name=collection_name,
    filter="id in [1, 2]",
    output_fields=["id", "text_field"]
)
for r in results:
    val = r.get("text_field")
    print(f"  id={r.get('id')}, text_field={val}, type={type(val).__name__}")

# Clean up
if client.has_collection(collection_name):
    client.drop_collection(collection_name)


**Test Results:**

Step 1: Insert with string value (establishes text_field as VARCHAR)
Result: SUCCESS - text_field is now VARCHAR type

Step 2: Insert with integer value (should be rejected - type mismatch)
Result: ACCEPTED - BUG CONFIRMED!
Expected: REJECTED (data type mismatch)

Step 3: Query results showing type inconsistency
  id=1, text_field=hello_world, type=str
  id=2, text_field=12345, type=int   <-- Same field, different type!

Milvus Log

N/A - Issue is reproducible via SDK behavior observation.

Anything else?

Documentation Evidence

According to the official Milvus Insert Entities Documentation:

Entity Schema Consistency:

"In Milvus, an Entity refers to data records in a Collection that share the same Schema, with the data in each field of a row constituting an Entity. Therefore, the Entities within the same Collection have the same attributes (such as field names, data types, and other constraints)."

Source URL: https://milvus.io/docs/insert-update-delete.md

Quote:

"Therefore, the Entities within the same Collection have the same attributes (such as field names, data types, and other constraints)."

According to the official Milvus Schema Design Documentation:

Schema Design:

"For schema design, Milvus supports flexible schema design, where you can define the fields and their data types, including vector fields."

Source URL: https://milvus.io/docs/manage-collections.md

According to the official Milvus Scalar Field Types Documentation:

Scalar Field Types:

"Milvus supports multiple scalar field types, including VarChar, Boolean, Int, Float, and Double."

"In Milvus, you can use VarChar fields to store strings."

Source URL: https://milvus.io/docs/schema.md

Evidence Analysis:

  1. Schema Consistency Requirement: Documentation explicitly states that entities within the same collection should have the same data types
  2. Data Type Definition: Fields must have defined data types (VarChar for strings, Int for integers, etc.)
  3. Type Distinction: VarChar and Int are distinct types that should not be interchangeable
  4. Validation Expectation: The documentation implies data type validation should occur during insertion

Impact

  • Data Type Inconsistency: The same field may contain both strings and integers
  • Application Code Confusion: Code cannot reliably handle data types
  • Query/Filter Unpredictability: Filtering on mixed-type fields may produce unexpected results
  • Schema Integrity Violation: Violates documented requirement for consistent data types

Severity

P2 (Medium) - This is a data integrity issue that affects schema consistency and application reliability.

Root Cause

The dynamic field functionality ($meta field) stores additional fields as JSON key-value pairs without validating data type consistency. When a field is first created through dynamic insertion, Milvus infers its type from the first value, but does not enforce type consistency for subsequent insertions.

Suggested Fix

  1. Track Field Types: Maintain a type registry for dynamically created fields
  2. Validate on Insert: Check that inserted values match the established field type
  3. Clear Error Messages: Provide specific error messages when type mismatch occurs

Example Validation Logic:

# When inserting into dynamic field
if field_name in dynamic_field_types:
    expected_type = dynamic_field_types[field_name]
    actual_type = infer_type(value)
    if expected_type != actual_type:
        raise ParamError(
            f"Invalid data type for field '{field_name}': "
            f"expected {expected_type}, got {actual_type}"
        )

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions