feat(csharp): sEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands#257
feat(csharp): sEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands#257msrathore-db wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Implements core SEA metadata support for the C# Databricks driver when using the Statement Execution API, adding SQL-command-based metadata retrieval and refactoring schema/result construction to shared helpers.
Changes:
- Implement
GetObjects,GetTableSchema,GetTableTypes, andGetInfoforStatementExecutionConnectionusingSHOW ...SQL commands. - Add metadata SQL command builders (
MetadataCommandBase,ShowCatalogs/Schemas/Tables/Columns/Keys/ForeignKeys). - Refactor/centralize metadata-related utilities and schema/result construction; update column metadata handling and related tests.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| csharp/test/Unit/Result/DescTableExtendedResultTest.cs | Updates expected column-size behavior for BINARY to align with new metadata computation. |
| csharp/src/StatementExecution/StatementExecutionConnection.cs | Implements SEA metadata APIs and IGetObjectsDataProvider via SQL execution + tracing. |
| csharp/src/StatementExecution/ShowTablesCommand.cs | Adds SHOW TABLES SQL builder with catalog scope + pattern filtering. |
| csharp/src/StatementExecution/ShowSchemasCommand.cs | Adds SHOW SCHEMAS SQL builder with optional catalog scoping + pattern filtering. |
| csharp/src/StatementExecution/ShowKeysCommand.cs | Adds SHOW KEYS and SHOW FOREIGN KEYS SQL builders for PK/FK metadata. |
| csharp/src/StatementExecution/ShowColumnsCommand.cs | Adds SHOW COLUMNS SQL builder with catalog/schema/table/column pattern filtering. |
| csharp/src/StatementExecution/ShowCatalogsCommand.cs | Adds SHOW CATALOGS SQL builder with pattern filtering. |
| csharp/src/StatementExecution/MetadataCommandBase.cs | Introduces shared quoting + pattern conversion helpers for metadata SQL builders. |
| csharp/src/Result/DescTableExtendedResult.cs | Refactors column metadata derivation to use shared ColumnMetadataHelper, with interval sizing special-case. |
| csharp/src/MetadataUtilities.cs | Adds shared catalog normalization and PK/FK gating helpers plus qualified table name builder. |
| csharp/src/DatabricksStatement.cs | Switches to shared metadata schema/result factories and simplifies empty-result construction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4110133 to
2efb879
Compare
2efb879 to
7b9b26a
Compare
…tTableTypes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7b9b26a to
3c9e2bf
Compare
|
I am not seeing you are referring to new hiveserver2 branch which contain the refactor code? |
|
Can you try to use the plugin command /AddressPR so claude would post response to each of the comment? Otherwise it is not clear whether those are accepted or rejected. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| internal async Task<List<RecordBatch>> ExecuteMetadataSqlAsync(string sql, CancellationToken cancellationToken = default) | ||
| { | ||
| var batches = new List<RecordBatch>(); | ||
| using var stmt = (StatementExecutionStatement)CreateStatement(); | ||
| stmt.SqlQuery = sql; | ||
| stmt.IsMetadataExecution = true; | ||
| var result = await stmt.ExecuteQueryAsync(cancellationToken).ConfigureAwait(false); | ||
| using var stream = result.Stream; | ||
| if (stream == null) return batches; | ||
| while (true) | ||
| { | ||
| cancellationToken.ThrowIfCancellationRequested(); | ||
| var batch = await stream.ReadNextRecordBatchAsync(cancellationToken).ConfigureAwait(false); | ||
| if (batch == null) break; | ||
| batches.Add(batch); | ||
| } | ||
| return batches; | ||
| } |
There was a problem hiding this comment.
ExecuteMetadataSqlAsync accumulates RecordBatch instances into a list but never disposes them. Since RecordBatch holds unmanaged Arrow buffers (and elsewhere in the codebase batches are explicitly disposed when no longer needed), this can leak memory during metadata operations. Consider processing batches in a streaming fashion (dispose each batch after extracting needed values) or ensure callers dispose all returned batches.
There was a problem hiding this comment.
The returned batches are consumed immediately by callers that extract column values and discard the batch references. The batches then become eligible for GC. Adding explicit disposal would require a streaming API change. Noted as follow-up tech debt.
…iew comments - Update hiveserver2 submodule to merged upstream main (e42efb4) with both PR #21 and #22 - Use shared DatabricksDriverName constant instead of hardcoded string in SEA GetInfo (addresses eric comment on line 425) - Restore inline comment on ShouldReturnEmptyPkFkResult (addresses eric comment on line 623) - Update Thrift parity comment to list all 8 preserved types (addresses Copilot comment on line 637) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement metadata operations for the Statement Execution API (REST) protocol, achieving parity with the existing Thrift implementation. ## Core Implementation - IGetObjectsDataProvider async implementation for SEA in StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables, GetColumns via SHOW commands) - Statement-level metadata routing in StatementExecutionStatement (GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference, GetTableSchema, GetTableTypes, GetInfo) - SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/ ForeignKeys) with pattern conversion (ADBC % → Databricks *) - ColumnMetadataHelper for computing column metadata from type name strings (SEA-only, Thrift gets these from server) - FlatColumnsResultBuilder for building flat GetColumns results - MetadataUtilities for shared catalog/pattern helpers ## Connection Parameters - EnablePKFK: gate PK/FK queries (default: true) - EnableMultipleCatalogSupport: single vs multi-catalog mode (default: true), matching Thrift DatabricksConnection behavior - DatabricksStatement-specific options silently accepted in SEA ## Design - Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas - Shared GetObjectsResultBuilder for nested GetObjects structure - Async internally, blocks once at ADBC sync boundary - x-databricks-sea-can-run-fully-sync header for metadata queries - Design doc: csharp/doc/sea-metadata-design.md ## Testing - Unit tests for ShowCommands and MetadataUtilities (146 tests) - E2E tests comparing Thrift and SEA parity (17 tests) - Metadata comparator tool with schema type comparison ## Reviewed in - hiveserver2: #21, #22 (merged) - databricks: #257, #258, #259, #260, #261, #282 (reviewed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement metadata operations for the Statement Execution API (REST) protocol, achieving parity with the existing Thrift implementation. ## Core Implementation - IGetObjectsDataProvider async implementation for SEA in StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables, GetColumns via SHOW commands) - Statement-level metadata routing in StatementExecutionStatement (GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference, GetTableSchema, GetTableTypes, GetInfo) - SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/ ForeignKeys) with pattern conversion (ADBC % → Databricks *) - ColumnMetadataHelper for computing column metadata from type name strings (SEA-only, Thrift gets these from server) - FlatColumnsResultBuilder for building flat GetColumns results - MetadataUtilities for shared catalog/pattern helpers ## Connection Parameters - EnablePKFK: gate PK/FK queries (default: true) - EnableMultipleCatalogSupport: single vs multi-catalog mode (default: true), matching Thrift DatabricksConnection behavior - DatabricksStatement-specific options silently accepted in SEA ## Design - Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas - Shared GetObjectsResultBuilder for nested GetObjects structure - Async internally, blocks once at ADBC sync boundary - x-databricks-sea-can-run-fully-sync header for metadata queries - Design doc: csharp/doc/sea-metadata-design.md ## Testing - Unit tests for ShowCommands and MetadataUtilities (146 tests) - E2E tests comparing Thrift and SEA parity (17 tests) - Metadata comparator tool with schema type comparison ## Reviewed in - hiveserver2: #21, #22 (merged) - databricks: #257, #258, #259, #260, #261, #282 (reviewed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## What's Changed Implement metadata operations for the Statement Execution API (REST) protocol, achieving parity with the existing Thrift implementation. ## Core Implementation - IGetObjectsDataProvider async implementation for SEA in StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables, GetColumns via SHOW commands) - Statement-level metadata routing in StatementExecutionStatement (GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference, GetTableSchema, GetTableTypes, GetInfo) - SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/ ForeignKeys) with pattern conversion (ADBC % → Databricks *) - ColumnMetadataHelper for computing column metadata from type name strings (SEA-only, Thrift gets these from server) - FlatColumnsResultBuilder for building flat GetColumns results - MetadataUtilities for shared catalog/pattern helpers ## Connection Parameters - EnablePKFK: gate PK/FK queries (default: true) - EnableMultipleCatalogSupport: single vs multi-catalog mode (default: true), matching Thrift DatabricksConnection behavior - DatabricksStatement-specific options silently accepted in SEA ## Design - Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas - Shared GetObjectsResultBuilder for nested GetObjects structure - Async internally, blocks once at ADBC sync boundary - x-databricks-sea-can-run-fully-sync header for metadata queries - Design doc: csharp/doc/sea-metadata-design.md ## Testing - Unit tests for ShowCommands and MetadataUtilities (146 tests) - E2E tests comparing Thrift and SEA parity (17 tests) - Metadata comparator tool with schema type comparison ## Reviewed in - hiveserver2: #21, #22 (merged) - databricks: #257, #258, #259, #260, #261, #282 (reviewed) --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🥞 Stacked PR
Summary
Core SEA metadata infrastructure and connection-level operations.
Connection-level metadata
IGetObjectsDataProviderwith sharedCancellationTokenper operationHiveServer2Connection.GetArrowTypeMetadataSchemaFactory.BuildGetInfoResultSQL command builders (
MetadataCommands/)MetadataCommandBasewithConvertPattern,QuoteIdentifier,AppendCatalogScopeShowCatalogs,ShowSchemas,ShowTables,ShowColumns,ShowKeys,ShowForeignKeysHTTP & protocol
x-databricks-sea-can-run-fully-syncheader on metadata requests viaIsMetadataflagExecuteMetadataSqlAsync— async SQL execution for metadataCreateMetadataTimeoutCts— disposable CTS with configurable timeoutUtilities
MetadataUtilities:NormalizeSparkCatalog,IsInvalidPKFKCatalog,BuildQualifiedTableNameDescTableExtendedResult: REAL→FLOAT viaColumnMetadataHelper.GetDataTypeCodedirectlyAlso includes
DatabricksStatementwired to sharedMetadataSchemaFactorysea-metadata-design.mdarchitecture documentation🤖 Generated with Claude Code