Skip to content

fix(csharp): preserve schema for empty result sets in SEA mode (PECO-2949)#325

Merged
eric-wang-1990 merged 2 commits intomainfrom
fix/sea-empty-result-schema
Mar 10, 2026
Merged

fix(csharp): preserve schema for empty result sets in SEA mode (PECO-2949)#325
eric-wang-1990 merged 2 commits intomainfrom
fix/sea-empty-result-schema

Conversation

@eric-wang-1990
Copy link
Collaborator

@eric-wang-1990 eric-wang-1990 commented Mar 9, 2026

Problem

When executing a query against an empty table via the Statement Execution API (SEA / REST mode), the ResultManifest always contains column schema even when totalRowCount is 0. However, CreateReader fell through to EmptyArrowArrayStream which always returned new Schema.Builder().Build() (zero fields), silently discarding the manifest schema.

This means callers of IArrowArrayStream.Schema received a schema with no columns for any empty-table query in SEA mode, making it impossible to inspect column metadata without actually having data rows.

Root Cause

EmptyArrowArrayStream was schema-unaware — it hardcoded an empty schema regardless of context. The CreateReader branching logic only extracted schema from the manifest in the CloudFetch and inline-data paths, not in the empty-result fallback path.

Fix

Follow the JDBC driver pattern where DatabricksResultSetMetaData is always constructed from ResultManifest independently of data presence:

  • Refactor GetSchemaFromManifest into a null-returning TryGetSchemaFromManifest helper
  • In the no-data branch of CreateReader, extract schema from manifest and pass it to EmptyArrowArrayStream
  • Make EmptyArrowArrayStream accept an optional Schema parameter (defaults to empty schema for the null-manifest / DDL case)

Tests

Added StatementExecutionEmptyResultSchemaTests with 3 unit tests:

  • ExecuteQuery_EmptyTable_SchemaContainsCorrectColumns — column names preserved for empty table
  • ExecuteQuery_EmptyTable_ArrowTypesAreMappedCorrectly — INT/BIGINT/STRING/BOOLEAN/DOUBLE/DATE/TIMESTAMP all map correctly
  • ExecuteQuery_NullManifest_ReturnsEmptySchema — null manifest (DDL) still returns empty schema without error

Known CI Failure (Pre-existing)

TelemetryTests.CanEnableFileTracingExporterViaEnvVariable(exporterName: "adbcfile") fails with the c078a8ec version of the hiveserver2 submodule (current main baseline). This failure is unrelated to this PR — the same test passes in PR #282 which updates the submodule to e42efb47. Once PR #282 merges, this failure will be resolved.

The failure is caused by a bug in TelemetryTests.cs line 69: Assert.True(string.IsNullOrEmpty(tc.ActivitySourceName)) should be Assert.False. The assertion exception is swallowed by the catch block, leaving activitySourceName as "" and the trace file unclosed/unflushed before the length check.

Closes PECO-2949

@eric-wang-1990
Copy link
Collaborator Author

CI Failure Analysis

The failing test TelemetryTests.CanEnableFileTracingExporterViaEnvVariable(exporterName: "adbcfile") is unrelated to this PR's changes. My changes only touch StatementExecutionStatement.cs (schema extraction logic) and add StatementExecutionEmptyResultSchemaTests.cs.

Root causes in the test (both pre-existing)

1. Bug in TelemetryTests.cs line 69 (csharp/hiveserver2/csharp/test/Common/TelemetryTests.cs):

// Current (wrong): asserts ActivitySourceName IS empty, but message says non-empty
Assert.True(string.IsNullOrEmpty(tc.ActivitySourceName), "expecting non-empty ActivitySourceName");
activitySourceName = tc.ActivitySourceName;  // Never reached — exception is caught

Should be Assert.False. Because the assertion throws, activitySourceName stays "" and line 86 (Assert.StartsWith(activitySourceName, files.First().Name)) trivially passes for any filename.

2. Bug in TracingFile.cs WriteSingleLineAsync (arrow-adbc submodule):

await stream.CopyToAsync(_currentFileStream);
await stream.FlushAsync();  // Flushes the source MemoryStream, not _currentFileStream!

The file stream is never explicitly flushed after writes — only on Dispose(). In CI this may produce a 0-byte file if dispose/flush ordering is affected by the test assertion bug above.

Neither issue is introduced by this PR. The TelemetryTests.cs in the hiveserver2 submodule was last modified in 7c90d2d ("move source and test up one folder") and has no prior "Tests Workflow" runs in this repo to compare against.


This comment was generated with GitHub MCP.

eric-wang-1990 and others added 2 commits March 10, 2026 07:09
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When executing a query against an empty table via the Statement Execution
API, the ResultManifest always contains column schema even when
totalRowCount is 0. Previously, CreateReader fell through to
EmptyArrowArrayStream which always returned new Schema.Builder().Build()
(zero fields), discarding the manifest schema entirely.

Follow the JDBC driver pattern (DatabricksResultSetMetaData is always
built from ResultManifest independently of data presence) by:

- Refactoring GetSchemaFromManifest into a null-returning
  TryGetSchemaFromManifest helper
- Extracting schema from the manifest in the no-data branch and passing
  it to EmptyArrowArrayStream
- Making EmptyArrowArrayStream accept an optional Schema parameter

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@eric-wang-1990 eric-wang-1990 force-pushed the fix/sea-empty-result-schema branch from 5513ab2 to 0397659 Compare March 10, 2026 07:10
@eric-wang-1990 eric-wang-1990 added this pull request to the merge queue Mar 10, 2026
Merged via the queue into main with commit 440f8e8 Mar 10, 2026
16 checks passed
@eric-wang-1990 eric-wang-1990 deleted the fix/sea-empty-result-schema branch March 10, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants