feat(rust): propagate SEA manifest metadata through Arrow schema#353
feat(rust): propagate SEA manifest metadata through Arrow schema#353vikrantpuppala merged 3 commits intomainfrom
Conversation
…schema Add design document for fixing result metadata diffs between the OSS ODBC driver and the reference driver. The root cause is information loss at the ADBC Rust layer — the SEA API provides rich column metadata (type_name, type_text, type_precision, type_scale) in the result manifest, but only the Arrow schema crosses the FFI boundary to the C++ ODBC driver. The solution encodes manifest metadata as Arrow field-level key-value metadata (databricks.* keys), applied in ResultReaderAdapter before FFI export. This matches the JDBC driver's approach of using server-provided metadata as the primary source. Co-authored-by: Isaac
Attach databricks.* field-level metadata (type_name, type_text, type_precision, type_scale, type_interval_type) from the SEA result manifest to Arrow fields in ResultReaderAdapter. This preserves server-provided column metadata through the FFI boundary so the ODBC driver can use it instead of reverse-engineering from Arrow type IDs. Co-authored-by: Isaac
9612712 to
c1ba98e
Compare
c1ba98e to
4c2e9ed
Compare
rust/src/reader/mod.rs
Outdated
| .iter() | ||
| .enumerate() | ||
| .map(|(i, field)| { | ||
| let col_info = columns.iter().find(|c| c.position as usize == i); |
There was a problem hiding this comment.
O(n*m) column lookup in augment_schema_with_manifest
let col_info = columns.iter().find(|c| c.position as usize == i);
For each of the n schema fields, this scans all m manifest columns. For typical queries (<100 columns) this is fine, but SELECT * on wide tables (1000+ columns) would be O(n²). Simple fix:
let col_by_pos: HashMap<usize, &ColumnInfo> = columns
.iter()
.map(|c| (c.position as usize, c))
.collect();
There was a problem hiding this comment.
Good catch — switched to a HashMap<usize, &ColumnInfo> built upfront. O(n+m) now.
rust/src/reader/mod.rs
Outdated
| .iter() | ||
| .enumerate() | ||
| .map(|(i, field)| { | ||
| let col_info = columns.iter().find(|c| c.position as usize == i); |
There was a problem hiding this comment.
position is i32 but used as usize without bounds check
c.position as usize == i
If the server ever returns a negative position, as usize wraps to a huge value silently. Add a guard:
let col_info = columns.iter().find(|c| c.position >= 0 && c.position as usize == i);
Or better, use usize::try_from(c.position).ok().
There was a problem hiding this comment.
Fixed — using usize::try_from(c.position).ok() in the HashMap construction, so negative positions are silently filtered out instead of wrapping.
gopalldb
left a comment
There was a problem hiding this comment.
LG, some minor comments
The metadata FFI functions (get_catalogs, get_schemas, get_tables, etc.) were discarding the manifest from ExecuteResult. Now they use export_execute_result() which passes the manifest to ResultReaderAdapter, ensuring databricks.* field metadata is attached to the Arrow schema for all query paths, not just Statement::execute(). Co-authored-by: Isaac
4c2e9ed to
0d56d31
Compare
Summary
databricks.*field-level key-value metadataChanges
src/types/sea.rstype_precision,type_scale,type_interval_typetoColumnInfosrc/reader/mod.rsmetadata_keysconstants +augment_schema_with_manifest()src/reader/mod.rsResultReaderAdapter::new()to accept optional manifestsrc/client/mod.rsmanifest: Option<ResultManifest>toExecuteResultsrc/client/sea.rsresponse.manifestthrough toExecuteResultsrc/statement.rsResultReaderAdapter::new()src/ffi/metadata.rsexport_reader()for all metadata FFI functionsTest plan
ColumnInfodeserialization with/without optional fieldsaugment_schema_with_manifest(basic types, DECIMAL precision/scale, INTERVAL, missing ColumnInfo, preserves existing metadata)ResultReaderAdapterwith and without manifestmetadata_propagation_testexample) verifying metadata flows through for INT, LONG, STRING, BOOLEAN, DOUBLE, FLOAT, SHORT, BYTE, DECIMAL(10,2), DECIMAL(18,5), DECIMAL(38,0), DATE, TIMESTAMP, ARRAY, MAP, STRUCT, BINARYThis pull request was AI-assisted by Isaac.