Skip to content

Commit 7b9b26a

Browse files
msrathore-dbclaude
andcommitted
feat(csharp): implement SEA metadata — GetObjects, GetTableSchema, GetTableTypes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4780d28 commit 7b9b26a

14 files changed

+884
-285
lines changed

csharp/doc/sea-metadata-design.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# SEA Metadata Architecture
2+
3+
## Overview
4+
5+
The Databricks ADBC driver supports two protocols for metadata retrieval:
6+
- **Thrift** (HiveServer2): Uses Thrift RPC calls to fetch metadata from the server
7+
- **SEA** (Statement Execution API): Uses SQL commands (`SHOW CATALOGS`, `SHOW SCHEMAS`, etc.) via the REST API
8+
9+
Both protocols share the same Arrow result structure for `GetObjects`, `GetColumns`, `GetPrimaryKeys`, and other metadata operations. This document describes how shared code is structured to avoid duplication while allowing each protocol to fetch data from its respective backend.
10+
11+
## Shared Interface: IGetObjectsDataProvider
12+
13+
```
14+
AdbcConnection.GetObjects() [sync ADBC API]
15+
16+
17+
GetObjectsResultBuilder.BuildGetObjectsResultAsync() [async orchestrator]
18+
19+
├─ provider.GetCatalogsAsync()
20+
├─ provider.GetSchemasAsync()
21+
├─ provider.GetTablesAsync()
22+
└─ provider.PopulateColumnInfoAsync()
23+
24+
25+
BuildResult() → HiveInfoArrowStream [Arrow structure construction]
26+
```
27+
28+
`IGetObjectsDataProvider` is the abstraction between "how to fetch metadata" and "how to build Arrow results":
29+
30+
- **GetObjectsResultBuilder** knows how to construct the nested Arrow structures (catalog → schema → table → column) required by the ADBC GetObjects spec.
31+
- **IGetObjectsDataProvider** implementations know how to retrieve the raw data from their protocol.
32+
33+
### Thrift Implementation (HiveServer2Connection)
34+
- Calls Thrift RPCs: `GetCatalogsAsync()`, `GetSchemasAsync()`, `GetTablesAsync()`, `GetColumnsAsync()`
35+
- Server returns typed result sets with precision, scale, column size
36+
- `SetPrecisionScaleAndTypeName` override handles per-connection type mapping
37+
38+
### SEA Implementation (StatementExecutionConnection)
39+
- Executes SQL: `SHOW CATALOGS`, `SHOW SCHEMAS IN ...`, `SHOW TABLES IN ...`, `SHOW COLUMNS IN ...`
40+
- Server returns type name strings only — metadata is computed locally via `ColumnMetadataHelper`
41+
- `ColumnMetadataHelper.PopulateTableInfoFromTypeName` derives data type codes, column sizes, decimal digits from type names
42+
43+
## Async Design
44+
45+
The ADBC base class defines `GetObjects()` as synchronous:
46+
```csharp
47+
public abstract IArrowArrayStream GetObjects(GetObjectsDepth depth, ...);
48+
```
49+
50+
Internally, the interface and builder are async:
51+
```csharp
52+
interface IGetObjectsDataProvider {
53+
Task<IReadOnlyList<string>> GetCatalogsAsync(...);
54+
// ...
55+
}
56+
57+
static async Task<HiveInfoArrowStream> BuildGetObjectsResultAsync(
58+
IGetObjectsDataProvider provider, ...) { ... }
59+
```
60+
61+
The sync ADBC boundary blocks once at the top level:
62+
```csharp
63+
public override IArrowArrayStream GetObjects(...) {
64+
return BuildGetObjectsResultAsync(this, ...).GetAwaiter().GetResult();
65+
}
66+
```
67+
68+
This avoids nested `.Result` blocking calls on every Thrift RPC while maintaining the sync ADBC API contract.
69+
70+
## Shared Schema Factories
71+
72+
`MetadataSchemaFactory` (in hiveserver2) provides schema definitions used by both protocols:
73+
74+
| Factory Method | Used By |
75+
|---|---|
76+
| `CreateCatalogsSchema()` | DatabricksStatement, StatementExecutionStatement |
77+
| `CreateSchemasSchema()` | DatabricksStatement, StatementExecutionStatement |
78+
| `CreateTablesSchema()` | DatabricksStatement, StatementExecutionStatement |
79+
| `CreateColumnMetadataSchema()` | DatabricksStatement, FlatColumnsResultBuilder |
80+
| `CreatePrimaryKeysSchema()` | MetadataSchemaFactory builders |
81+
| `CreateCrossReferenceSchema()` | MetadataSchemaFactory builders |
82+
| `BuildGetInfoResult()` | HiveServer2Connection, StatementExecutionConnection |
83+
84+
## Type Mapping
85+
86+
### Thrift Path
87+
```
88+
Server result → SetPrecisionScaleAndTypeName (per-connection override)
89+
├─ SparkConnection: parses DECIMAL/CHAR precision from type name
90+
└─ HiveServer2ExtendedConnection: uses server-provided values
91+
92+
For flat GetColumns, EnhanceGetColumnsResult (on HiveServer2Statement) adds
93+
a BASE_TYPE_NAME column and optionally overrides precision/scale by calling
94+
SetPrecisionScaleAndTypeName per row. This is Thrift-only — SEA builds the
95+
complete result from scratch via FlatColumnsResultBuilder.
96+
```
97+
98+
### SEA Path
99+
```
100+
SHOW COLUMNS response → ColumnMetadataHelper.PopulateTableInfoFromTypeName
101+
└─ Computes: data type code, column size, decimal digits, base type name
102+
```
103+
104+
### Shared GetArrowType
105+
`HiveServer2Connection.GetArrowType()` (internal static) converts a column type ID to an Apache Arrow type. Both Thrift and SEA use this — SEA derives the type ID via `ColumnMetadataHelper.GetDataTypeCode()` first.
106+
107+
## SQL Command Builders
108+
109+
SEA metadata uses `MetadataCommandBase` with command subclasses:
110+
111+
| Command | SQL Generated |
112+
|---|---|
113+
| `ShowCatalogsCommand` | `SHOW CATALOGS [LIKE 'pattern']` |
114+
| `ShowSchemasCommand` | `SHOW SCHEMAS IN \`catalog\` [LIKE 'pattern']` |
115+
| `ShowTablesCommand` | `SHOW TABLES IN CATALOG \`catalog\` [SCHEMA LIKE ...] [LIKE ...]` |
116+
| `ShowColumnsCommand` | `SHOW COLUMNS IN CATALOG \`catalog\` [SCHEMA LIKE ...] [TABLE LIKE ...] [LIKE ...]` |
117+
| `ShowKeysCommand` | `SHOW KEYS IN CATALOG ... IN SCHEMA ... IN TABLE ...` |
118+
| `ShowForeignKeysCommand` | `SHOW FOREIGN KEYS IN CATALOG ... IN SCHEMA ... IN TABLE ...` |
119+
120+
Pattern conversion: ADBC `%` → Databricks `*`, ADBC `_` → Databricks `.`
121+
122+
## GetObjects RPC Count
123+
124+
Each IGetObjectsDataProvider method makes one server call. Total RPCs by depth:
125+
126+
| Depth | Methods Called | RPCs |
127+
|---|---|---|
128+
| Catalogs | GetCatalogsAsync | 1 |
129+
| DbSchemas | + GetSchemasAsync | 2 |
130+
| Tables | + GetTablesAsync | 3 |
131+
| All | + PopulateColumnInfoAsync | 4 |

0 commit comments

Comments
 (0)