Skip to content

Conversation

@loserwang1024
Copy link
Contributor

@loserwang1024 loserwang1024 commented Jan 7, 2026

Purpose

Linked issue: close #2310 , add field id to nested rows.

Brief change log

All fields are assigned unique, sequential IDs in a flattened order, regardless of nesting level. For example:

struct< 
  a: tinyint, 
  b: struct< 
    c: tinyint, 
    d: struct< 
      e: tinyint, 
      f: tinyint
    >, 
    g: string 
  > 
>

Then the field Id for each field is:

Field Name Field ID
a 0
b 1
b.c 2
b.d 3
b.d.e 4
b.d.f 5
b.g 6

Why Flatten Numerical Order?

  • Simplifies ID Management: No need to compute hierarchical offsets (e.g., parent_id * depth + child_index).
  • Compatibility: Works seamlessly with flat data structures like Arrow's RecordBatch. In our FileLogProjection, same thing is also done. Thus later projection push down will be more easier.
image

API and Format

Add field_id to org.apache.fluss.types.DataField.

Documentation

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds field_id support for nested Row types to enable proper schema evolution in Fluss. The changes ensure that nested fields within Row, Array, and Map types maintain globally unique identifiers, allowing the system to correctly track and handle schema changes over time.

Key Changes

  • Added field_id tracking to DataField class and extended support to nested structures (RowType, ArrayType, MapType)
  • Implemented ReassignFieldId visitor pattern to automatically assign unique field IDs to nested types during schema creation
  • Enhanced JSON serialization/deserialization with backward compatibility for schemas without field IDs
  • Simplified Projection class to work with field positions instead of IDs, reducing complexity

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
fluss-common/src/main/java/org/apache/fluss/types/DataField.java Added fieldId parameter to constructors with default value -1
fluss-common/src/main/java/org/apache/fluss/types/DataTypes.java Added overloaded FIELD methods accepting fieldId parameter
fluss-common/src/main/java/org/apache/fluss/types/RowType.java Added field ID validation, auto-assignment for -1 values, and equalsIgnoreFieldId method
fluss-common/src/main/java/org/apache/fluss/types/ReassignFieldId.java New visitor to recursively reassign field IDs in nested structures
fluss-common/src/main/java/org/apache/fluss/metadata/Schema.java Enhanced validation to check field ID uniqueness across nested fields and updated column() method to reassign nested field IDs
fluss-server/src/main/java/org/apache/fluss/server/coordinator/SchemaUpdate.java Updated addColumn to reassign field IDs for nested types
fluss-common/src/main/java/org/apache/fluss/utils/json/DataTypeJsonSerde.java Added field_id serialization/deserialization with backward compatibility
fluss-common/src/main/java/org/apache/fluss/utils/Projection.java Simplified to work with positions instead of IDs, removing complexity
fluss-common/src/main/java/org/apache/fluss/record/FileLogProjection.java Updated to use field positions for projection
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/utils/FlinkConversions.java Modified to use column() method which now assigns field IDs
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/reader/FlinkSourceSplitReader.java Added recursive comparison methods to ignore field IDs when comparing schemas
fluss-server/src/test/java/org/apache/fluss/server/coordinator/TableManagerITCase.java Added test for nested row field IDs in schema evolution
fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/utils/FlinkConversionsTest.java Added comprehensive tests for nested row field ID assignment
fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/sink/FlinkComplexTypeITCase.java Added integration test for projection and adding columns with nested rows
fluss-common/src/test/java/org/apache/fluss/utils/json/DataTypeJsonSerdeTest.java Added backward compatibility tests and field ID validation tests
fluss-common/src/test/java/org/apache/fluss/metadata/TableSchemaTest.java Added tests for field ID reassignment behavior
fluss-common/src/test/java/org/apache/fluss/record/FileLogProjectionTest.java Updated tests for new projection error messages
fluss-client/src/test/java/org/apache/fluss/client/admin/FlussAdminITCase.java Updated to test nested row column addition with field IDs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@loserwang1024 loserwang1024 force-pushed the row-field-id branch 2 times, most recently from f3506ae to 45ee54e Compare January 7, 2026 10:20
@loserwang1024
Copy link
Contributor Author

@wuchong , I have modified based on comment. Please help check again.Especially, currently, Schema.Builder#column Schema.Builder#fromFields and Schema.Builder#fromRowType will reassign id while Schema.Builder#fromColumns won't. The later one is used to build from RPC and json value.

@wuchong wuchong merged commit 256bd7f into apache:main Jan 8, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Nested Row 's field_id

2 participants