Skip to content

feat(rust): add metadata FFI layer with streaming Arrow pass-through#281

Merged
gopalldb merged 14 commits intomainfrom
metadata
Mar 12, 2026
Merged

feat(rust): add metadata FFI layer with streaming Arrow pass-through#281
gopalldb merged 14 commits intomainfrom
metadata

Conversation

@gopalldb
Copy link
Collaborator

@gopalldb gopalldb commented Mar 2, 2026

Summary

  • Adds a metadata-ffi feature flag exposing extern "C" functions for catalog metadata operations (GetCatalogs, GetSchemas, GetTables, GetColumns, GetPrimaryKeys, GetForeignKeys)
  • Results stream as Arrow RecordBatchReader via the Arrow C Data Interface (FFI_ArrowArrayStream), using ResultReaderAdapter to bridge ResultReaderRecordBatchReader without buffering
  • Includes ConnectionMetadataService (thin pass-through to DatabricksClient), C FFI handle/error management with catch_unwind panic safety, and thread-local error buffer
  • list_columns accepts optional catalog — uses SHOW COLUMNS IN ALL CATALOGS server-side when catalog is None/wildcard (no client-side multi-catalog orchestration)

Files

New

  • ffi/mod.rs, ffi/catalog.rs, ffi/error.rs, ffi/handle.rs — C FFI surface (9 extern "C" functions)
  • metadata/service.rsConnectionMetadataService returning streaming Box<dyn ResultReader>
  • spec/odbc-metadata-ffi-design.md — Design specification

Modified

  • client/mod.rslist_columns catalog: &strOption<&str>
  • client/sea.rs — Handle optional catalog in list_columns
  • metadata/sql.rsbuild_show_columns supports ALL CATALOGS, returns String (not Result)
  • connection.rs — Updated list_columns call sites
  • reader/mod.rs — Added ResultReaderAdapter unit tests

Data Flow

DatabricksClient.list_*() → ExecuteResult.reader → ConnectionMetadataService (pass-through)
  → export_reader() → ResultReaderAdapter → FFI_ArrowArrayStream → C caller

No intermediate collect_batches or concat_batches — batches stream lazily from network to caller.

Test plan

  • All 125 unit tests pass (cargo test)
  • cargo clippy -- -D warnings clean
  • cargo fmt clean
  • Integration test with ODBC wrapper consuming FFI functions

🤖 Generated with Claude Code

Add a new `odbc-ffi` feature that exposes extern "C" functions for ODBC
catalog operations (SQLTables, SQLColumns, SQLGetTypeInfo, SQLPrimaryKeys,
etc.) returning flat Arrow result sets via the Arrow C Data Interface.

New files:
- metadata/schemas.rs: JDBC/ODBC result set schema definitions
- metadata/type_info.rs: Static type info catalog for SQLGetTypeInfo
- metadata/service.rs: MetadataService trait + ConnectionMetadataService
- ffi/mod.rs, error.rs, handle.rs, odbc.rs: C FFI surface
- spec/odbc-metadata-ffi-design.md: Design specification

Also adds PK/FK parsers to parse.rs, public accessors on Connection,
and INTERVAL type mapping aligned with the Databricks JDBC driver.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gopalldb gopalldb requested review from vikrantpuppala and removed request for vikrantpuppala March 2, 2026 08:45
gopalldb and others added 9 commits March 2, 2026 09:21
Soundness fixes:
- Remove unsound 'static lifetime from handle_to_service(), use
  explicit lifetime parameter 'a instead
- Remove unsound 'static lifetime from c_str_to_option/c_str_to_str
- Remove dead odbc_connection_create() that always failed
- Gate client()/runtime_handle() accessors behind odbc-ffi feature

Correctness fixes:
- Fix FK update_rule/delete_rule: use SQL_NO_ACTION (3) instead of
  SQL_RESTRICT (1) — Databricks FKs are informational
- Fix TYPE_INFO_SCHEMA: use Int16 instead of Boolean for
  CASE_SENSITIVE, UNSIGNED_ATTRIBUTE, FIXED_PREC_SCALE,
  AUTO_UNIQUE_VALUE per ODBC spec
- Document SQL_ALL_TYPES == 0 ambiguity in odbc_get_type_info

Code quality:
- Extract build_type_info_batch() as shared function, removing
  duplicated test helper
- Remove let _ = n; anti-pattern in get_tables/get_primary_keys/
  get_foreign_keys
- Add design notes documenting INTERVAL/ARRAY/MAP/STRUCT type code
  decisions and TIMESTAMP/TIMESTAMP_NTZ dual-entry behavior
- Add test for get_type_info(93) returning both TIMESTAMP entries

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The static type info catalog (SQLGetTypeInfo data) is ODBC-specific and
belongs in the ODBC wrapper layer, not in the Arrow-native ADBC driver.
The caller is expected to handle type conversions.

Removed:
- src/metadata/type_info.rs (static TYPE_INFO_ENTRIES catalog)
- TYPE_INFO_SCHEMA from schemas.rs
- get_type_info from MetadataService trait and implementation
- odbc_get_type_info FFI function
- build_type_info_batch and append_opt_bool helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The FFI layer is not ODBC-specific — any library or application can call
these functions to get flat Arrow RecordBatches of catalog metadata.

Renames:
- Feature: odbc-ffi → metadata-ffi
- File: ffi/odbc.rs → ffi/catalog.rs
- Types: OdbcConnectionHandle → FfiConnectionHandle,
         OdbcFfiStatus → FfiStatus, OdbcFfiError → FfiError
- Functions: odbc_* → metadata_*

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove intermediate Arrow → Rust structs → Arrow conversion in the FFI
path. The metadata service now returns raw RecordBatch data directly from
Databricks, letting the caller handle column renaming and type mapping.

- Rewrite service.rs: remove MetadataService trait, stub methods, and
  all parsing/rebuilding logic; add collect_batches() using concat_batches
- Delete schemas.rs (only used by old service.rs for schema definitions)
- Trim ffi/catalog.rs from 16 to 9 FFI functions (remove table_types,
  statistics, special_columns, procedures, procedure_columns,
  table_privileges, column_privileges)
- Clean up parse.rs: remove PrimaryKeyInfo, ForeignKeyInfo structs and
  parse_primary_keys, parse_foreign_keys (only used by old service.rs)
- ADBC get_objects path (connection.rs → builder.rs) unchanged

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tation

Rewrite the spec to reflect the raw Arrow pass-through architecture:
- No MetadataService trait, just ConnectionMetadataService
- No schemas.rs or type_info.rs (deleted)
- 9 FFI functions (down from 16), 6 catalog methods
- Feature renamed from odbc-ffi to metadata-ffi
- Caller handles column renaming, type mapping, reshaping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…error

Add panic safety (catch_unwind) to handle management functions and
clear_last_error at FFI entry points so callers never see stale errors.
Add unit tests for error buffer behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tches

Replace the collect-all-then-re-wrap pattern with direct streaming:
- service.rs returns Box<dyn ResultReader> instead of RecordBatch
- Remove collect_batches() and multi-catalog orchestration in get_columns
- catalog.rs uses ResultReaderAdapter to stream through FFI
- list_columns now takes Option<&str> catalog, uses SHOW COLUMNS IN ALL CATALOGS
- build_show_columns handles optional catalog (no longer returns Result)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d FFI handle

- ResultReaderAdapter: schema, iteration, empty, error propagation, schema error
- export_reader: empty reader, multiple batches, schema error
- handle_panic: &str, String, and unknown panic types
- handle.rs: null connection sets proper error message

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gopalldb gopalldb changed the title feat(rust): add ODBC metadata FFI layer for catalog functions feat(rust): add metadata FFI layer with streaming Arrow pass-through Mar 5, 2026
@gopalldb gopalldb requested a review from vikrantpuppala March 6, 2026 00:33
// See the License for the specific language governing permissions and
// limitations under the License.

//! `extern "C"` catalog metadata functions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not only for catalog?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name is misleading

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to metadata.rs


/// Export a RecordBatch as an FFI_ArrowArrayStream.
///
/// The caller is responsible for releasing the stream.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this safe? why do we need this function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, needed for required params (PK/FK)

let reader: Box<dyn RecordBatchReader + Send> = Box::new(RecordBatchIterator::new(
vec![Ok(batch)].into_iter(),
schema,
));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are collecting all batches first so this is not really streaming, can we use ResultReaderAdapter that we already have?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

/// - `conn` must be a valid handle from `metadata_connection_from_ref()`
/// - `out` must point to a valid, writable `FFI_ArrowArrayStream`
#[no_mangle]
pub unsafe extern "C" fn metadata_get_catalogs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of the extern "C" functions in catalog.rs use std::panic::catch_unwind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

rust/src/lib.rs Outdated
// Metadata FFI — additional extern "C" functions for catalog metadata
// when built with `cargo build --features metadata-ffi`
#[cfg(feature = "metadata-ffi")]
pub mod ffi;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be pub(crate) mod ffi

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

use std::sync::Arc;

/// Collect all batches from an `ExecuteResult` into a single `RecordBatch`.
fn collect_batches(result: ExecuteResult) -> Result<RecordBatch> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? let's stream directly, see existing result set adapter pattern

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, returns Box directly

let mut all_batches = Vec::new();
let mut schema = None;

for cat in &catalogs_to_query {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is sequential, any reason why? eventually we should move to all catalogs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

///
/// Executes metadata SQL queries and returns raw Arrow `RecordBatch` results
/// directly from the server, without intermediate parsing or schema reshaping.
pub struct ConnectionMetadataService {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we gate this with #[cfg(feature = "metadata-ffi")]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

gopalldb and others added 2 commits March 9, 2026 10:50
- Rename ffi/catalog.rs → ffi/metadata.rs (file handles all metadata, not just catalogs)
- Change `pub mod ffi` to `pub(crate) mod ffi` (no need to expose FFI internals)
- Gate `metadata/service.rs` with `#[cfg(feature = "metadata-ffi")]`

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…for None

- Remove wildcard/ALL CATALOGS handling from build_show_columns
- Catalog is now required; service returns empty stream when None
- SeaClient validates catalog is present before building SQL

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vikrantpuppala
Copy link
Collaborator

Overall: Good progress from v1 — streaming via ResultReaderAdapter, catch_unwind on all entry points, and pub(crate) visibility are all solid improvements.

My main concern is the layering. ConnectionMetadataService + the custom handle lifecycle (metadata_connection_from_ref / metadata_connection_free) add ~300 lines of indirection for what is essentially forwarding to Connection's existing fields. Every service method is block_on(client.method()) → .reader with no additional logic (except get_columns's 3-line empty-catalog guard).

I'd recommend removing service.rs and handle.rs entirely and having the FFI functions take *const Connection directly. The ODBC C++ side already has adbc_connection_.private_data which is the Rust Connection* — no new lifecycle management needed. This cuts the PR by ~300 lines and eliminates a layer that doesn't carry its weight.

Detailed comments on individual files below.


This comment was generated with GitHub MCP.

@@ -0,0 +1,149 @@
// Copyright (c) 2025 ADBC Drivers Contributors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file can be removed. Every method follows the same pattern:

pub fn get_X(&self, ...) -> Result<Box<dyn ResultReader + Send>> {
    let result = self.runtime.block_on(self.client.list_X(&self.session_id, ...))?;
    Ok(result.reader)
}

The FFI functions in metadata.rs can call Connection directly — it already holds client, session_id, and runtime. The only non-trivial logic is get_columns's empty-catalog guard (3 lines), which can live in the FFI function itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -0,0 +1,154 @@
// Copyright (c) 2025 ADBC Drivers Contributors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the service removed, this file can also go. Instead of:

metadata_connection_from_ref(conn_ptr) → Box<Service> → opaque handle
metadata_get_tables(handle, ...) → handle_to_service → svc.get_tables()
metadata_connection_free(handle)

The FFI functions can just take *const Connection:

pub unsafe extern "C" fn metadata_get_tables(
    conn: *const Connection,
    ...,
    out: *mut FFI_ArrowArrayStream,
) -> FfiStatus {
    let conn = &*conn;
    let result = conn.runtime_handle().block_on(
        conn.client().list_tables(conn.session_id(), ...)
    )?;
    export_reader(result.reader, out)
}

This eliminates the create/free lifecycle entirely. The ODBC side already has the Connection* via adbc_connection_.private_data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional service layer removed

.message("catalog is required for SHOW COLUMNS (ALL CATALOGS not yet supported)")
})?;

/// Build SHOW COLUMNS command.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_show_columns changed from returning Result<String> to panicking via expect(). This is a regression — library code shouldn't panic on invalid input. While catch_unwind in the FFI layer would catch it, unwinding through Rust code has edge cases and shouldn't be relied on for normal error handling.

Two options:

  1. Revert to Result<String> (preferred)
  2. Take catalog: &str (not Option) since it's always required — then the type system enforces it and there's nothing to panic on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to taking catalog: &str

/// Catalog is required — callers must pass `Some(catalog)`.
async fn list_columns(
&self,
session_id: &str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_columns changed from catalog: &str to catalog: Option<&str>, but SeaClient immediately errors on None. If catalog is always required at the client level, the trait should enforce it with &str.

The None → empty result case belongs at the call site (the FFI function), not pushed into the trait.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -0,0 +1,579 @@
// Copyright (c) 2025 ADBC Drivers Contributors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ODBC C++ side will need declarations for FfiStatus, FfiError, and all extern "C" functions. Can you add a databricks_metadata_ffi.h header? Or set up cbindgen to generate one. Without it the ODBC side has to manually mirror these declarations, which is error-prone as the surface evolves.

Copy link
Collaborator Author

@gopalldb gopalldb Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added

pub unsafe extern "C" fn metadata_get_foreign_keys(
conn: FfiConnectionHandle,
catalog: *const c_char,
schema: *const c_char,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MockReader and SchemaErrorReader are duplicated between here and rust/src/reader/mod.rs tests. Consider extracting to a shared #[cfg(test)] module.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"ARRAY" => 2003, // JDBC ARRAY
"MAP" => 2000, // JDBC JAVA_OBJECT
"STRUCT" => 2002, // JDBC STRUCT
"INTERVAL" => 12, // JDBC VARCHAR (matches JDBC driver)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The INTERVAL → VARCHAR mapping is a good fix but unrelated to the metadata FFI feature. Should this be a separate commit to keep the diff focused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this is minor change, can keep it

gopalldb and others added 2 commits March 10, 2026 11:26
…, add C header

- Remove `service.rs` and `handle.rs`: FFI functions now take `*const Connection`
  directly instead of going through ConnectionMetadataService + opaque handle.
  Eliminates ~300 lines of indirection that didn't carry its weight.
- Change `list_columns` trait: `catalog: Option<&str>` → `catalog: &str` since
  the client always requires it. Empty-catalog guard moves to FFI layer.
- Fix `build_show_columns`: takes `catalog: &str` parameter instead of panicking
  via `expect()` on missing builder state. Type system enforces the requirement.
- Add `databricks_metadata_ffi.h`: C header with declarations for all FFI types
  and functions, so the ODBC C++ side doesn't manually mirror Rust declarations.
- Deduplicate test mocks: extract MockReader, ErrorReader, SchemaErrorReader to
  shared `reader/test_utils.rs` module, used by both reader and FFI tests.
- Update design spec to reflect the simplified architecture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@vikrantpuppala vikrantpuppala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@gopalldb gopalldb added this pull request to the merge queue Mar 12, 2026
Merged via the queue into main with commit 8490f1d Mar 12, 2026
22 checks passed
@gopalldb gopalldb deleted the metadata branch March 12, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants