Skip to content

feat(csharp): sEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands#257

Open
msrathore-db wants to merge 1 commit intomainfrom
feat/sea-metadata-part1
Open

feat(csharp): sEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands#257
msrathore-db wants to merge 1 commit intomainfrom
feat/sea-metadata-part1

Conversation

@msrathore-db
Copy link
Collaborator

@msrathore-db msrathore-db commented Feb 27, 2026

🥞 Stacked PR

Summary

Core SEA metadata infrastructure and connection-level operations.

Connection-level metadata

  • GetObjects: async IGetObjectsDataProvider with shared CancellationToken per operation
  • GetTableSchema: via SHOW COLUMNS + shared HiveServer2Connection.GetArrowType
  • GetTableTypes: returns TABLE, VIEW
  • GetInfo: via shared MetadataSchemaFactory.BuildGetInfoResult
  • GetObjects xdbc_column_size/xdbc_decimal_digits: nulled for non-DECIMAL/CHAR to match Thrift

SQL command builders (MetadataCommands/)

  • MetadataCommandBase with ConvertPattern, QuoteIdentifier, AppendCatalogScope
  • ShowCatalogs, ShowSchemas, ShowTables, ShowColumns, ShowKeys, ShowForeignKeys

HTTP & protocol

  • x-databricks-sea-can-run-fully-sync header on metadata requests via IsMetadata flag
  • ExecuteMetadataSqlAsync — async SQL execution for metadata
  • CreateMetadataTimeoutCts — disposable CTS with configurable timeout

Utilities

  • MetadataUtilities: NormalizeSparkCatalog, IsInvalidPKFKCatalog, BuildQualifiedTableName
  • DescTableExtendedResult: REAL→FLOAT via ColumnMetadataHelper.GetDataTypeCode directly

Also includes

  • DatabricksStatement wired to shared MetadataSchemaFactory
  • sea-metadata-design.md architecture documentation

🤖 Generated with Claude Code

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements core SEA metadata support for the C# Databricks driver when using the Statement Execution API, adding SQL-command-based metadata retrieval and refactoring schema/result construction to shared helpers.

Changes:

  • Implement GetObjects, GetTableSchema, GetTableTypes, and GetInfo for StatementExecutionConnection using SHOW ... SQL commands.
  • Add metadata SQL command builders (MetadataCommandBase, ShowCatalogs/Schemas/Tables/Columns/Keys/ForeignKeys).
  • Refactor/centralize metadata-related utilities and schema/result construction; update column metadata handling and related tests.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
csharp/test/Unit/Result/DescTableExtendedResultTest.cs Updates expected column-size behavior for BINARY to align with new metadata computation.
csharp/src/StatementExecution/StatementExecutionConnection.cs Implements SEA metadata APIs and IGetObjectsDataProvider via SQL execution + tracing.
csharp/src/StatementExecution/ShowTablesCommand.cs Adds SHOW TABLES SQL builder with catalog scope + pattern filtering.
csharp/src/StatementExecution/ShowSchemasCommand.cs Adds SHOW SCHEMAS SQL builder with optional catalog scoping + pattern filtering.
csharp/src/StatementExecution/ShowKeysCommand.cs Adds SHOW KEYS and SHOW FOREIGN KEYS SQL builders for PK/FK metadata.
csharp/src/StatementExecution/ShowColumnsCommand.cs Adds SHOW COLUMNS SQL builder with catalog/schema/table/column pattern filtering.
csharp/src/StatementExecution/ShowCatalogsCommand.cs Adds SHOW CATALOGS SQL builder with pattern filtering.
csharp/src/StatementExecution/MetadataCommandBase.cs Introduces shared quoting + pattern conversion helpers for metadata SQL builders.
csharp/src/Result/DescTableExtendedResult.cs Refactors column metadata derivation to use shared ColumnMetadataHelper, with interval sizing special-case.
csharp/src/MetadataUtilities.cs Adds shared catalog normalization and PK/FK gating helpers plus qualified table name builder.
csharp/src/DatabricksStatement.cs Switches to shared metadata schema/result factories and simplifies empty-result construction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…tTableTypes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@msrathore-db msrathore-db force-pushed the feat/sea-metadata-part1 branch from 7b9b26a to 3c9e2bf Compare March 2, 2026 09:46
@msrathore-db msrathore-db changed the title feat(csharp): implement SEA metadata — GetObjects, GetTableSchema, GetTableTypes feat(csharp): SEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands Mar 2, 2026
@eric-wang-1990
Copy link
Collaborator

I am not seeing you are referring to new hiveserver2 branch which contain the refactor code?

@eric-wang-1990
Copy link
Collaborator

Can you try to use the plugin command /AddressPR so claude would post response to each of the comment? Otherwise it is not clear whether those are accepted or rejected.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +658 to +675
internal async Task<List<RecordBatch>> ExecuteMetadataSqlAsync(string sql, CancellationToken cancellationToken = default)
{
var batches = new List<RecordBatch>();
using var stmt = (StatementExecutionStatement)CreateStatement();
stmt.SqlQuery = sql;
stmt.IsMetadataExecution = true;
var result = await stmt.ExecuteQueryAsync(cancellationToken).ConfigureAwait(false);
using var stream = result.Stream;
if (stream == null) return batches;
while (true)
{
cancellationToken.ThrowIfCancellationRequested();
var batch = await stream.ReadNextRecordBatchAsync(cancellationToken).ConfigureAwait(false);
if (batch == null) break;
batches.Add(batch);
}
return batches;
}
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecuteMetadataSqlAsync accumulates RecordBatch instances into a list but never disposes them. Since RecordBatch holds unmanaged Arrow buffers (and elsewhere in the codebase batches are explicitly disposed when no longer needed), this can leak memory during metadata operations. Consider processing batches in a streaming fashion (dispose each batch after extracting needed values) or ensure callers dispose all returned batches.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned batches are consumed immediately by callers that extract column values and discard the batch references. The batches then become eligible for GC. Adding explicit disposal would require a streaming API change. Noted as follow-up tech debt.

msrathore-db added a commit that referenced this pull request Mar 6, 2026
…iew comments

- Update hiveserver2 submodule to merged upstream main (e42efb4)
  with both PR #21 and #22
- Use shared DatabricksDriverName constant instead of hardcoded string
  in SEA GetInfo (addresses eric comment on line 425)
- Restore inline comment on ShouldReturnEmptyPkFkResult (addresses
  eric comment on line 623)
- Update Thrift parity comment to list all 8 preserved types
  (addresses Copilot comment on line 637)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@eric-wang-1990 eric-wang-1990 changed the title feat(csharp): SEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands feat(csharp): sEA connection metadata — GetObjects, GetTableSchema, GetTableTypes, GetInfo, SQL commands Mar 6, 2026
msrathore-db added a commit that referenced this pull request Mar 9, 2026
Implement metadata operations for the Statement Execution API (REST)
protocol, achieving parity with the existing Thrift implementation.

## Core Implementation
- IGetObjectsDataProvider async implementation for SEA in
  StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables,
  GetColumns via SHOW commands)
- Statement-level metadata routing in StatementExecutionStatement
  (GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference,
  GetTableSchema, GetTableTypes, GetInfo)
- SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/
  ForeignKeys) with pattern conversion (ADBC % → Databricks *)
- ColumnMetadataHelper for computing column metadata from type name
  strings (SEA-only, Thrift gets these from server)
- FlatColumnsResultBuilder for building flat GetColumns results
- MetadataUtilities for shared catalog/pattern helpers

## Connection Parameters
- EnablePKFK: gate PK/FK queries (default: true)
- EnableMultipleCatalogSupport: single vs multi-catalog mode
  (default: true), matching Thrift DatabricksConnection behavior
- DatabricksStatement-specific options silently accepted in SEA

## Design
- Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas
- Shared GetObjectsResultBuilder for nested GetObjects structure
- Async internally, blocks once at ADBC sync boundary
- x-databricks-sea-can-run-fully-sync header for metadata queries
- Design doc: csharp/doc/sea-metadata-design.md

## Testing
- Unit tests for ShowCommands and MetadataUtilities (146 tests)
- E2E tests comparing Thrift and SEA parity (17 tests)
- Metadata comparator tool with schema type comparison

## Reviewed in
- hiveserver2: #21, #22 (merged)
- databricks: #257, #258, #259, #260, #261, #282 (reviewed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
msrathore-db added a commit that referenced this pull request Mar 9, 2026
Implement metadata operations for the Statement Execution API (REST)
protocol, achieving parity with the existing Thrift implementation.

## Core Implementation
- IGetObjectsDataProvider async implementation for SEA in
  StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables,
  GetColumns via SHOW commands)
- Statement-level metadata routing in StatementExecutionStatement
  (GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference,
  GetTableSchema, GetTableTypes, GetInfo)
- SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/
  ForeignKeys) with pattern conversion (ADBC % → Databricks *)
- ColumnMetadataHelper for computing column metadata from type name
  strings (SEA-only, Thrift gets these from server)
- FlatColumnsResultBuilder for building flat GetColumns results
- MetadataUtilities for shared catalog/pattern helpers

## Connection Parameters
- EnablePKFK: gate PK/FK queries (default: true)
- EnableMultipleCatalogSupport: single vs multi-catalog mode
  (default: true), matching Thrift DatabricksConnection behavior
- DatabricksStatement-specific options silently accepted in SEA

## Design
- Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas
- Shared GetObjectsResultBuilder for nested GetObjects structure
- Async internally, blocks once at ADBC sync boundary
- x-databricks-sea-can-run-fully-sync header for metadata queries
- Design doc: csharp/doc/sea-metadata-design.md

## Testing
- Unit tests for ShowCommands and MetadataUtilities (146 tests)
- E2E tests comparing Thrift and SEA parity (17 tests)
- Metadata comparator tool with schema type comparison

## Reviewed in
- hiveserver2: #21, #22 (merged)
- databricks: #257, #258, #259, #260, #261, #282 (reviewed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 10, 2026
## What's Changed
Implement metadata operations for the Statement Execution API (REST)
protocol, achieving parity with the existing Thrift implementation.

## Core Implementation
- IGetObjectsDataProvider async implementation for SEA in
StatementExecutionConnection (GetCatalogs, GetSchemas, GetTables,
GetColumns via SHOW commands)
- Statement-level metadata routing in StatementExecutionStatement
(GetColumns, GetColumnsExtended, GetPrimaryKeys, GetCrossReference,
GetTableSchema, GetTableTypes, GetInfo)
- SQL command builders (ShowCatalogs/Schemas/Tables/Columns/Keys/
ForeignKeys) with pattern conversion (ADBC % → Databricks *)
- ColumnMetadataHelper for computing column metadata from type name
strings (SEA-only, Thrift gets these from server)
- FlatColumnsResultBuilder for building flat GetColumns results
- MetadataUtilities for shared catalog/pattern helpers

## Connection Parameters
- EnablePKFK: gate PK/FK queries (default: true)
- EnableMultipleCatalogSupport: single vs multi-catalog mode (default:
true), matching Thrift DatabricksConnection behavior
- DatabricksStatement-specific options silently accepted in SEA

## Design
- Shared MetadataSchemaFactory (hiveserver2) for Arrow schemas
- Shared GetObjectsResultBuilder for nested GetObjects structure
- Async internally, blocks once at ADBC sync boundary
- x-databricks-sea-can-run-fully-sync header for metadata queries
- Design doc: csharp/doc/sea-metadata-design.md

## Testing
- Unit tests for ShowCommands and MetadataUtilities (146 tests)
- E2E tests comparing Thrift and SEA parity (17 tests)
- Metadata comparator tool with schema type comparison

## Reviewed in
- hiveserver2: #21, #22 (merged)
- databricks: #257, #258, #259, #260, #261, #282 (reviewed)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants