Skip to content

Conversation

@marcelo-cjl
Copy link

@marcelo-cjl marcelo-cjl commented Dec 24, 2025

related: #1729

  • Add null value handling in genFieldData() for all vector types
  • Add validData bitmap support in getRowCount() and getVectorData() for nullable vectors
  • Add client-side validation requiring nullable=true when adding vector fields
  • Add comprehensive tests for all 6 vector types
  • Add NullableVectorExample
  • Core invariant: vector fields must be explicitly nullable when added/used; the PR enforces this client-side (SchemaUtils.convertToGrpcFieldSchema(..., forAddField=true)) and treats fieldData.getValidDataList() as the authoritative per-row presence mask when assembling/returning vector arrays so vectors are aligned to the validity bitmap rather than inferred from array lengths or dimensions.
  • Logic simplified/removed: heuristics that inferred nullness or row counts from vector array sizes/dimensions are removed — genFieldData now uniformly builds and applies a validity mask for nullable fields and genVectorField returns empty vector placeholders for empty inputs, eliminating multiple ad‑hoc filtering paths and special-casing across vector types.
  • Why this does NOT introduce data loss or regressions: packing logic filters null entries only for compact storage but preserves and returns original row alignment by re-expanding results using the same validity bitmap (FieldDataWrapper.getVectorData now accepts validData and re-inserts nulls after unpacking). Non-nullable fields remain on the old code paths (no validData present), so existing non-null behavior, search semantics and stored non-null bytes are unchanged.
  • New capability and scope: adds client-side nullable vector support across all six vector types (FloatVector, BinaryVector, Float16Vector, BFloat16Vector, Int8Vector, SparseFloatVector) — implemented in ParamUtils (genFieldData/genVectorField), FieldDataWrapper (getVectorData/getRowCount), SchemaUtils (nullable enforcement for add-field), CollectionService (use for addCollectionField), with comprehensive tests (NullableVectorTest) and an example (NullableVectorExample).

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: marcelo-cjl
To complete the pull request process, please assign yelusion2 after the PR has been reviewed.
You can assign the PR to them by writing /assign @yelusion2 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Dec 24, 2025

Walkthrough

Adds end-to-end nullable vector support: client-side packing now records and transmits per-element validity masks; packing/unpacking logic was extended to preserve and re-insert nulls when returning vector fields; schema conversion and runtime validation enforce that vector fields added to existing collections must be nullable; a Java example and integration tests exercise nullable vectors across multiple vector types, including dynamic addition of nullable fields and search/query behavior with mixed null/non-null data.

Changes

Cohort / File(s) Summary
Example
examples/src/main/java/io/milvus/v2/NullableVectorExample.java
New example demonstrating insertion, querying, searching, indexing, and dynamic addition of nullable vector fields.
Param generation
sdk-core/src/main/java/io/milvus/param/ParamUtils.java
Record validity masks for nullable fields, filter null entries before vector construction, and return empty-dimension-safe placeholders for empty inputs across vector types.
Field packing / retrieval
sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java
Changed getVectorData(...) signature to accept validData; row count honors validity masks; vector packing now maps invalid positions to null when validData provided; nested struct/vector handling updated accordingly; guards added for zero-dimension vectors.
Schema conversion / validation
sdk-core/src/main/java/io/milvus/v2/utils/SchemaUtils.java
Added overloaded convertToGrpcFieldSchema(fieldSchema, boolean forAddField) and updated the default method to delegate to it. When forAddField is true, runtime validation rejects adding non-nullable vector fields (throws INVALID_PARAMS).
Collection field addition
sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
Use the new SchemaUtils overload when converting an added field: convertToGrpcFieldSchema(fieldSchema, true) (plus a minor formatting change).
Integration tests
sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java
New JUnit 5 test class exercising nullable vectors across six vector types, insertion/query/search flows, and validation when adding non-nullable vector fields.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant MilvusClient as Milvus Client
    participant ParamUtils
    participant FieldWrapper as FieldDataWrapper
    participant CollService as CollectionService

    rect rgba(200,220,255,0.15)
    Note over Client,MilvusClient: Insert with nullable vectors
    Client->>MilvusClient: insert(payload with nullable vectors)
    MilvusClient->>ParamUtils: genVectorField(fieldValues)
    ParamUtils->>ParamUtils: build validity mask\nfilter null entries
    ParamUtils-->>MilvusClient: packed vectors + validity mask
    MilvusClient->>MilvusClient: send to Milvus
    end

    rect rgba(200,255,220,0.12)
    Note over Client,FieldWrapper: Query/Search and unpack
    Client->>MilvusClient: query()/search()
    MilvusClient->>FieldWrapper: getVectorData(dt, vector, validData)
    FieldWrapper->>FieldWrapper: assemble packed vectors
    FieldWrapper->>FieldWrapper: map invalid positions -> null
    FieldWrapper-->>MilvusClient: results with nulls restored
    MilvusClient-->>Client: response
    end

    rect rgba(255,230,210,0.12)
    Note over Client,CollService: Add field validation
    Client->>CollService: addCollectionField(newField)
    CollService->>CollService: convertToGrpcFieldSchema(newField, true)
    CollService->>CollService: SchemaUtils validates nullable constraint
    alt non-nullable vector
        CollService-->>Client: INVALID_PARAMS error
    else nullable vector
        CollService-->>Client: field added
    end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I nibble masks of truth and null,
Vectors hop through clovered rows,
Some are empty, some are full,
Now data knows when each one goes.
Hooray—nullable fields in bloom! 🌿🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Support nullable vectors in Java SDK' accurately summarizes the primary objective of the changeset, which adds comprehensive support for nullable vector fields across the Milvus Java SDK.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java (1)

657-664: Consider reusing ParamUtils.isVectorDataType() to avoid duplication.

This helper duplicates logic from ParamUtils.isVectorDataType(). While there's a type mismatch (io.milvus.v2.common.DataType vs io.milvus.grpc.DataType), consider either:

  1. Converting the type before calling ParamUtils.isVectorDataType()
  2. Adding a similar method to a shared utility class

This would reduce maintenance burden if new vector types are added.

sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (3)

79-85: Consider externalizing the Milvus connection URI.

The hardcoded localhost:19530 may not work in CI environments or when Milvus runs in a different location. Consider using an environment variable with a fallback:

String uri = System.getenv().getOrDefault("MILVUS_URI", "http://localhost:19530");

316-334: Random null distribution may cause test flakiness.

Using RANDOM.nextInt(100) < nullPercent without a seed could theoretically produce edge cases (e.g., all nulls or no nulls) that break assertions on lines 383-384. Consider:

  1. Using a fixed seed: new Random(42) for reproducibility
  2. Or ensuring at least one null and one non-null entry explicitly
🔎 Suggested fix
-    private static final Random RANDOM = new Random();
+    private static final Random RANDOM = new Random(42);  // Fixed seed for reproducibility

358-359: Avoid fixed Thread.sleep() for data availability.

Using Thread.sleep(1000) is fragile and may cause flakiness under load. Consider polling with a timeout or using Milvus's flush/sync mechanisms if available to ensure data is queryable.

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eb10051 and afbe9f3.

📒 Files selected for processing (5)
  • examples/src/main/java/io/milvus/v2/NullableVectorExample.java
  • sdk-core/src/main/java/io/milvus/param/ParamUtils.java
  • sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
  • sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java
🧰 Additional context used
🧬 Code graph analysis (2)
sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (11)
tests/milvustestv2/src/main/java/com/zilliz/milvustestv2/utils/Float16Utils.java (1)
  • Float16Utils (9-245)
sdk-core/src/main/java/io/milvus/v2/client/ConnectConfig.java (1)
  • ConnectConfig (31-455)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/index/request/CreateIndexReq.java (1)
  • CreateIndexReq (26-144)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/CreateCollectionReq.java (2)
  • CreateCollectionReq (36-1112)
  • CollectionSchema (404-529)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddCollectionFieldReq.java (1)
  • AddCollectionFieldReq (22-83)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/LoadCollectionReq.java (1)
  • LoadCollectionReq (25-227)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/QueryReq.java (1)
  • QueryReq (26-277)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/SearchReq.java (1)
  • SearchReq (32-482)
examples/src/main/java/io/milvus/v2/NullableVectorExample.java (8)
sdk-core/src/main/java/io/milvus/v2/common/IndexParam.java (1)
  • IndexParam (24-245)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/AddCollectionFieldReq.java (1)
  • AddCollectionFieldReq (22-83)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/DropCollectionReq.java (1)
  • DropCollectionReq (22-120)
sdk-core/src/main/java/io/milvus/v2/service/collection/request/ReleaseCollectionReq.java (1)
  • ReleaseCollectionReq (22-118)
sdk-core/src/main/java/io/milvus/v2/service/index/request/CreateIndexReq.java (1)
  • CreateIndexReq (26-144)
sdk-core/src/main/java/io/milvus/v2/service/vector/request/InsertReq.java (1)
  • InsertReq (26-141)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/InsertResp.java (1)
  • InsertResp (25-81)
sdk-core/src/main/java/io/milvus/v2/service/vector/response/SearchResp.java (1)
  • SearchResp (27-185)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (7)
sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java (1)

265-271: LGTM - Proper validation for nullable vector fields.

The validation correctly enforces that vector fields added to existing collections must be nullable. This aligns with Milvus server constraints where existing rows need null values for new vector fields.

sdk-core/src/main/java/io/milvus/param/ParamUtils.java (2)

1207-1216: LGTM - Correct nullable vector handling.

The implementation properly:

  1. Records validity mask per element
  2. Filters out null values before vector construction
  3. Maintains alignment between validity mask and data

This mirrors the scalar field handling pattern on lines 1220-1228.


1241-1255: No issues found. The code correctly handles empty nullable vector fields by returning dim=0, which the Milvus server accepts. The schema dimension is preserved separately at the collection schema level and is not affected by the vector field data's dim value. The existing test coverage (NullableVectorTest.java) validates this behavior across all vector types.

sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (3)

141-144: LGTM - Correct row count handling for nullable vectors.

When a validity mask is present, using validData.size() correctly returns the total row count including nulls, rather than computing from the (filtered) vector data.

Also applies to: 159-162, 170-173


291-369: LGTM - Well-structured nullable vector data handling.

The refactored implementation:

  1. Packs vector data into appropriate format (List<List>, List, etc.)
  2. Correctly maps invalid positions to null using the validity mask
  3. Uses dataIdx counter to track position in packed (non-null) data

The logic correctly handles the case where packData contains only valid entries while validData indicates which positions should be null.


442-443: Verify null validData is intentional for struct vector elements.

Passing null for validData when processing ArrayOfVector elements assumes inner vectors within structs don't support nullable entries. Confirm this aligns with Milvus server behavior for nested vector fields.

examples/src/main/java/io/milvus/v2/NullableVectorExample.java (1)

46-318: Good example demonstrating nullable vector usage.

The example clearly shows:

  1. Creating collections with nullable vector fields
  2. Inserting data with null and non-null vectors
  3. Adding nullable vector fields to existing collections
  4. Querying and searching behavior with null vectors

The code is well-documented and serves as a practical reference for users.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (2)

319-328: Consider using a fixed seed for test reproducibility.

Using RANDOM.nextInt(100) < nullPercent makes test outcomes non-deterministic. If a test fails, it may be difficult to reproduce the exact data distribution.

🔎 Suggested improvement for deterministic tests
-    private static final Random RANDOM = new Random();
+    private static final Random RANDOM = new Random(42); // Fixed seed for reproducibility

Alternatively, use a predictable pattern:

boolean isNull = (i % 2 == 0); // Every other row is null

361-362: Consider replacing sleep with a polling mechanism for more reliable tests.

Thread.sleep(1000) can be flaky—data might not be available in time on slower systems, or it wastes time on faster ones. A retry loop with a timeout would be more robust, though acceptable for integration tests.

sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (1)

354-366: Add defensive bounds check to prevent potential IndexOutOfBoundsException.

If the server sends inconsistent data where validData has more true values than elements in packData, line 360 will throw an IndexOutOfBoundsException without a helpful message. A defensive check would aid debugging.

🔎 Suggested defensive check
 if (validData != null && !validData.isEmpty()) {
     List<Object> result = new ArrayList<>();
     int dataIdx = 0;
     for (Boolean valid : validData) {
         if (valid) {
+            if (dataIdx >= packData.size()) {
+                throw new IllegalResponseException(
+                    String.format("ValidData/packData size mismatch: expected more data at index %d, packData size is %d",
+                        dataIdx, packData.size()));
+            }
             result.add(packData.get(dataIdx++));
         } else {
             result.add(null);
         }
     }
     return result;
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between afbe9f3 and 2c92a7a.

📒 Files selected for processing (5)
  • examples/src/main/java/io/milvus/v2/NullableVectorExample.java
  • sdk-core/src/main/java/io/milvus/param/ParamUtils.java
  • sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
  • sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • examples/src/main/java/io/milvus/v2/NullableVectorExample.java
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
  • sdk-core/src/main/java/io/milvus/param/ParamUtils.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Deploy milvus server,build and test
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (9)
sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (4)

43-64: LGTM - Well-structured test setup with comprehensive vector type coverage.

The test class is well-organized with clear configuration for all six vector types. The use of Testcontainers ensures isolated integration testing.


110-165: LGTM - Vector generation covers all types correctly.

The implementation handles each vector type appropriately. The rewind() calls after populating ByteBuffers ensure they're ready for reading.


90-108: LGTM - Cleanup logic handles failures gracefully.

Swallowing exceptions during test teardown is appropriate to ensure cleanup attempts don't mask actual test failures.


206-259: LGTM - Good negative test for validation logic.

The test correctly verifies that adding a non-nullable vector field to an existing collection fails with an appropriate error message.

sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (5)

140-144: LGTM - Correct precedence for validData in row count calculation.

When a validity mask exists, its size correctly represents the total row count, including null values.


294-311: LGTM - Good guard clause for zero-dimension edge case.

The if (dim > 0) check prevents potential issues with division and unnecessary processing for edge cases.


278-289: LGTM - Correctly handles immutability of gRPC lists.

The copy is necessary since gRPC-generated lists are immutable. The size equality check ensures data consistency before modification.


439-446: Verify: Nullable vectors not supported in ArrayOfStruct fields.

Passing null for validData means vectors nested within struct arrays won't have null handling applied. Confirm this is intentional behavior.


291-292: LGTM - Method signature properly extended for nullable support.

The new validData parameter is correctly passed from all call sites, enabling proper null handling throughout the vector data retrieval path.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (1)

354-368: Nullable vector reconstruction logic is correct.

The reconstruction correctly reverses the null-filtering done in ParamUtils.genFieldData:

  1. Iterate through validData bitmap
  2. For true, take next element from packData (line 360)
  3. For false, insert null (line 362)

The logic assumes packData.size() equals the count of true values in validData, which should hold if the write path (ParamUtils) is consistent.

Optional: Consider adding a defensive bounds check before packData.get(dataIdx++) to catch inconsistencies between write and read paths, though this is likely unnecessary given the controlled flow.

Optional: Add defensive bounds check
 if (validData != null && !validData.isEmpty()) {
     List<Object> result = new ArrayList<>();
     int dataIdx = 0;
     for (Boolean valid : validData) {
         if (valid) {
+            if (dataIdx >= packData.size()) {
+                throw new IllegalResponseException("Inconsistent validData and packData sizes");
+            }
             result.add(packData.get(dataIdx++));
         } else {
             result.add(null);
         }
     }
     return result;
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c92a7a and a2c0994.

📒 Files selected for processing (5)
  • examples/src/main/java/io/milvus/v2/NullableVectorExample.java
  • sdk-core/src/main/java/io/milvus/param/ParamUtils.java
  • sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
  • sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (8)
sdk-core/src/main/java/io/milvus/param/ParamUtils.java (2)

1207-1216: LGTM! Clean nullable vector handling.

The validity bitmap construction and null-filtering logic is correct. For each vector in the input list, you record whether it's non-null in validData, then pass only the non-null vectors to genVectorField. This approach cleanly separates the validity mask from the packed vector data and mirrors the existing pattern for nullable scalar fields (lines 1220-1229).


1241-1255: Early return for empty vectors is correct.

When all vectors are null (after filtering in genFieldData), the objects list becomes empty. Returning a VectorField with dim=0 and empty data is appropriate since there's no data to infer the dimension from.

However, verify that the server correctly handles dim=0 FieldData with all validData=false entries. Note that DataUtils.java explicitly rejects dim=0 vectors in struct field contexts, suggesting potential inconsistency in how the server handles this edge case across different code paths.

sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (2)

56-80: Excellent test coverage across all vector types.

The VectorTypeConfig setup covers all six vector types with appropriate index configurations. This parameterized approach makes the tests maintainable and ensures consistent coverage.


206-259: Good negative test for non-nullable vector field validation.

This test correctly verifies that attempting to add a non-nullable vector field to an existing collection fails with an appropriate error message. This aligns with the PR objective of "client-side validation requiring nullable=true when adding vector fields."

examples/src/main/java/io/milvus/v2/NullableVectorExample.java (2)

53-176: Clear demonstration of nullable vector insertion and querying.

Demo 1 effectively illustrates the key features:

  • Creating a collection with isNullable=true for the vector field
  • Inserting mixed null/non-null vectors using JsonNull.INSTANCE
  • Querying to verify null preservation
  • Searching to confirm only non-null vectors are returned

The example is well-structured and educational.


178-298: Excellent demonstration of adding nullable vector fields to existing collections.

Demo 2 showcases a practical schema evolution scenario:

  • Adding a new nullable vector field to an existing collection
  • Properly releasing the collection before schema modification
  • Creating an index for the new field
  • Verifying that existing rows have null for the new field

This addresses the PR objective of "Add client-side validation requiring nullable=true when adding vector fields" and provides clear guidance for users.

sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (2)

141-173: Correct nullable vector row count handling.

The getRowCount logic properly distinguishes between nullable and non-nullable vectors:

  • When validData is present, returns validData.size() (total count including nulls)
  • When absent, falls back to dimension-based calculation from packed data

This is consistent across FloatVector (lines 141-144), Binary/Float16/BFloat16/Int8Vector (lines 159-162), and SparseFloatVector (lines 170-173).


291-353: Good addition of dimension guards in vector packing.

The dim > 0 guards (lines 298, 319) prevent division-by-zero errors when all vectors are null (resulting in empty data). This safely handles the edge case introduced by nullable vector support.

The packing logic correctly processes only non-null vectors (filtered earlier by ParamUtils.genFieldData).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (1)

354-366: Nullable vector handling is correctly implemented.

The algorithm properly reconstructs the original vector list by:

  1. Iterating through the validData mask
  2. Taking the next vector from packData when valid=true
  3. Inserting null when valid=false

This maintains the original row order while preserving null positions.

Optional: Add defensive validation

Consider adding a validation check to catch potential bugs in the packing logic:

 // Handle nullable vectors - insert null values at positions where validData is false
 if (validData != null && !validData.isEmpty()) {
+    long expectedValid = validData.stream().filter(b -> b).count();
+    if (packData.size() != expectedValid) {
+        throw new IllegalResponseException(String.format(
+            "Mismatch between validData (%d valid) and packed data size (%d)",
+            expectedValid, packData.size()));
+    }
     List<Object> result = new ArrayList<>();
     int dataIdx = 0;

This catches invariant violations early with a clear error message rather than throwing IndexOutOfBoundsException.

sdk-core/src/main/java/io/milvus/param/ParamUtils.java (1)

1241-1255: Early return for empty vector lists handles the all-nulls case.

When all vectors in a nullable field are null, the filtered list becomes empty and this early return is triggered. Setting dim=0 is acceptable because the actual dimension is defined in the schema, not in the data.

Optional: Clarify with a comment

Consider adding a comment to explain this edge case:

+// Handle empty vector list (occurs when inserting 0 rows or when all vectors are null in a nullable field)
+// Set dim=0; the actual dimension is defined in the collection schema
 if (objects.isEmpty()) {
     if (dataType == DataType.FloatVector) {
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a2c0994 and ae53b54.

📒 Files selected for processing (6)
  • examples/src/main/java/io/milvus/v2/NullableVectorExample.java
  • sdk-core/src/main/java/io/milvus/param/ParamUtils.java
  • sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
  • sdk-core/src/main/java/io/milvus/v2/utils/SchemaUtils.java
  • sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java
🚧 Files skipped from review as they are similar to previous changes (1)
  • sdk-core/src/main/java/io/milvus/v2/service/collection/CollectionService.java
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and test
  • GitHub Check: Summary
🔇 Additional comments (8)
sdk-core/src/main/java/io/milvus/v2/utils/SchemaUtils.java (1)

52-65: LGTM! Well-designed validation for nullable vector fields.

The approach is sound:

  • Existing collections require nullable=true for added vector fields (existing rows lack values for the new field)
  • New collections allow non-nullable vector fields (all rows must provide values)
  • Clear error message guides users to the correct usage

The backward-compatible overload preserves existing behavior while enabling the new validation path.

sdk-core/src/main/java/io/milvus/response/FieldDataWrapper.java (1)

141-173: Row count calculation correctly handles nullable vectors.

For all vector types, the method properly prioritizes validData.size() over calculating from the data array. This is essential because:

  • validData.size() reflects the true row count (including null vectors)
  • Data array size reflects only non-null vectors

The early return prevents divide-by-zero errors when all vectors are null.

sdk-core/src/main/java/io/milvus/param/ParamUtils.java (1)

1207-1218: Nullable vector packing is correctly implemented.

The logic properly:

  1. Records which elements are null in the validData bitmap
  2. Filters out nulls before passing to genVectorField
  3. Preserves the validity information for server-side reconstruction

This approach ensures the data array contains only non-null vectors while the validData bitmap preserves the original row positions.

sdk-core/src/test/java/io/milvus/v2/service/vector/NullableVectorTest.java (3)

57-80: Excellent test coverage across all vector types.

The parameterized approach with VectorTypeConfig ensures consistent testing for:

  • FloatVector, BinaryVector, Float16Vector, BFloat16Vector, SparseFloatVector, Int8Vector
  • Each with appropriate metric types and index types

This validates that nullable support works uniformly across all vector types.


261-429: Comprehensive test validates the complete nullable vector workflow.

The test thoroughly exercises:

  1. Schema creation with nullable vector field ✓
  2. Data insertion with ~50% null vectors ✓
  3. Query validation confirms correct null/valid counts ✓
  4. Search behavior verifies only non-null vectors are returned ✓

The assertion at line 421 is particularly important—it ensures that nullable vectors don't break the search/retrieval path, as null vectors should not participate in similarity search.


208-259: Validation constraint is properly tested.

The test confirms that adding a non-nullable vector field to an existing collection fails as expected. This prevents users from creating an inconsistent state where existing rows would lack values for a required field.

examples/src/main/java/io/milvus/v2/NullableVectorExample.java (2)

53-176: Clear demonstration of nullable vector insert and retrieval.

The example effectively shows:

  • Creating a collection with nullable=true on the vector field
  • Inserting mixed null and non-null vectors using JsonNull.INSTANCE
  • Querying returns both null and non-null vectors
  • Searching returns only non-null vectors

The explanatory print statements make this a useful learning resource.


178-298: Demonstrates the key use case of adding fields to existing collections.

This example illustrates an important workflow:

  1. Create collection with one vector field
  2. Insert data
  3. Add a second nullable vector field (must be nullable!)
  4. Verify existing rows have null for the new field

This is a common schema evolution pattern, and the example clearly shows why the nullable constraint exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants