Skip to content

feat(rust): propagate SEA manifest metadata through Arrow schema#353

Merged
vikrantpuppala merged 3 commits intomainfrom
design/result-metadata-propagation
Mar 20, 2026
Merged

feat(rust): propagate SEA manifest metadata through Arrow schema#353
vikrantpuppala merged 3 commits intomainfrom
design/result-metadata-propagation

Conversation

@vikrantpuppala
Copy link
Collaborator

@vikrantpuppala vikrantpuppala commented Mar 17, 2026

Summary

  • Propagate SEA manifest column metadata (type_name, type_text, type_precision, type_scale, type_interval_type) through the Arrow C Data Interface FFI boundary as databricks.* field-level key-value metadata
  • Enables the C++ ODBC driver to read server-provided metadata instead of reverse-engineering it from Arrow type IDs — fixing systematic diffs in nullable, precision, scale, display_size, octet_length, and type_name
  • All paths that export Arrow streams (Statement::execute, metadata FFI functions) now carry the manifest through

Changes

File Change
src/types/sea.rs Add type_precision, type_scale, type_interval_type to ColumnInfo
src/reader/mod.rs Add metadata_keys constants + augment_schema_with_manifest()
src/reader/mod.rs Update ResultReaderAdapter::new() to accept optional manifest
src/client/mod.rs Add manifest: Option<ResultManifest> to ExecuteResult
src/client/sea.rs Pass response.manifest through to ExecuteResult
src/statement.rs Pass manifest to ResultReaderAdapter::new()
src/ffi/metadata.rs Pass manifest through export_reader() for all metadata FFI functions

Test plan

  • Unit tests for ColumnInfo deserialization with/without optional fields
  • Unit tests for augment_schema_with_manifest (basic types, DECIMAL precision/scale, INTERVAL, missing ColumnInfo, preserves existing metadata)
  • Unit tests for ResultReaderAdapter with and without manifest
  • E2E test (metadata_propagation_test example) verifying metadata flows through for INT, LONG, STRING, BOOLEAN, DOUBLE, FLOAT, SHORT, BYTE, DECIMAL(10,2), DECIMAL(18,5), DECIMAL(38,0), DATE, TIMESTAMP, ARRAY, MAP, STRUCT, BINARY
  • ODBC cross-driver comparator (after C++ consumer PR lands)

This pull request was AI-assisted by Isaac.

…schema

Add design document for fixing result metadata diffs between the OSS ODBC
driver and the reference driver. The root cause is information loss at the
ADBC Rust layer — the SEA API provides rich column metadata (type_name,
type_text, type_precision, type_scale) in the result manifest, but only
the Arrow schema crosses the FFI boundary to the C++ ODBC driver.

The solution encodes manifest metadata as Arrow field-level key-value
metadata (databricks.* keys), applied in ResultReaderAdapter before FFI
export. This matches the JDBC driver's approach of using server-provided
metadata as the primary source.

Co-authored-by: Isaac
Attach databricks.* field-level metadata (type_name, type_text,
type_precision, type_scale, type_interval_type) from the SEA result
manifest to Arrow fields in ResultReaderAdapter. This preserves
server-provided column metadata through the FFI boundary so the
ODBC driver can use it instead of reverse-engineering from Arrow
type IDs.

Co-authored-by: Isaac
@vikrantpuppala vikrantpuppala force-pushed the design/result-metadata-propagation branch from 9612712 to c1ba98e Compare March 17, 2026 19:14
@vikrantpuppala vikrantpuppala changed the title [PECOBLR-2085] Design: propagate SEA manifest metadata through Arrow schema feat(rust): propagate SEA manifest metadata through Arrow schema Mar 17, 2026
@vikrantpuppala vikrantpuppala force-pushed the design/result-metadata-propagation branch from c1ba98e to 4c2e9ed Compare March 17, 2026 19:49
@vikrantpuppala vikrantpuppala requested a review from gopalldb March 20, 2026 10:31
.iter()
.enumerate()
.map(|(i, field)| {
let col_info = columns.iter().find(|c| c.position as usize == i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O(n*m) column lookup in augment_schema_with_manifest

let col_info = columns.iter().find(|c| c.position as usize == i);

For each of the n schema fields, this scans all m manifest columns. For typical queries (<100 columns) this is fine, but SELECT * on wide tables (1000+ columns) would be O(n²). Simple fix:

let col_by_pos: HashMap<usize, &ColumnInfo> = columns
.iter()
.map(|c| (c.position as usize, c))
.collect();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — switched to a HashMap<usize, &ColumnInfo> built upfront. O(n+m) now.

.iter()
.enumerate()
.map(|(i, field)| {
let col_info = columns.iter().find(|c| c.position as usize == i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

position is i32 but used as usize without bounds check

c.position as usize == i

If the server ever returns a negative position, as usize wraps to a huge value silently. Add a guard:

let col_info = columns.iter().find(|c| c.position >= 0 && c.position as usize == i);

Or better, use usize::try_from(c.position).ok().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — using usize::try_from(c.position).ok() in the HashMap construction, so negative positions are silently filtered out instead of wrapping.

Copy link
Collaborator

@gopalldb gopalldb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, some minor comments

The metadata FFI functions (get_catalogs, get_schemas, get_tables, etc.)
were discarding the manifest from ExecuteResult. Now they use
export_execute_result() which passes the manifest to ResultReaderAdapter,
ensuring databricks.* field metadata is attached to the Arrow schema
for all query paths, not just Statement::execute().

Co-authored-by: Isaac
@vikrantpuppala vikrantpuppala force-pushed the design/result-metadata-propagation branch from 4c2e9ed to 0d56d31 Compare March 20, 2026 11:39
@vikrantpuppala vikrantpuppala added this pull request to the merge queue Mar 20, 2026
Merged via the queue into main with commit cd544ce Mar 20, 2026
23 checks passed
@vikrantpuppala vikrantpuppala deleted the design/result-metadata-propagation branch March 20, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants