feat(csharp): implement protocol selection for REST API support #19

jadewang-db · 2025-11-04T22:32:48Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/PECO-2788-cloudfetch-protocol-agnostic [Files changed]
- stack/PECO-2790-statement-execution-result-fetcher [Files changed]
  - stack/PECO-2791-connection-statement-rest-execution [Files changed]
    - stack/PECO-2837-session-management [Files changed]
      - stack/PECO-2839-inline-reader [Files changed]
        
        stack/PECO-2840-protocol-selection [Files changed]
        
        stack/PECO-2838-statement-execution [Files changed]
        
        stack/PECO-2792-feature-parity [Files changed]
        
        stack/PECO-2849-extend-cloudfetch-e2e-tests [Files changed]

Summary

Implements protocol selection logic to enable choosing between
Thrift and REST (Statement Execution API) protocols. Users can
now specify the protocol via the adbc.databricks.protocol
configuration parameter.

Key Changes:

✅ Protocol selection in DatabricksDatabase.Connect() using
factory pattern
✅ Extract HTTP client creation helper
(CreateHttpClientForRestApi()) for code reuse
✅ Comprehensive unit tests for protocol selection
✅ Backward compatible - defaults to Thrift when protocol not
specified

Protocol Options:

"thrift" (default) - Uses existing HiveServer2/Thrift
protocol
"rest" - Uses new Statement Execution REST API
Case-insensitive validation with clear error messages for
invalid values

Implementation Approach:

Used lightweight factory pattern in DatabricksDatabase
instead of composition pattern
Created static helper CreateHttpClientForRestApi() that
reuses authentication handlers (OAuth, token exchange, token
refresh) and infrastructure (retry, tracing, error handling)
All existing configuration parameters are reused - no new
constants needed

Make the CloudFetch pipeline protocol-agnostic to support both Thrift and REST API implementations, enabling code reuse and easier addition of new protocols in the future. Changes: - Created BaseResultFetcher abstract class extracting common fetching logic - Background task management, error handling, queue management - Template method pattern with FetchAllResultsAsync() for protocol-specific logic - Reduces code duplication by ~175 lines across implementations - Refactored CloudFetchResultFetcher to extend BaseResultFetcher - Reduced from 401 to 314 lines (22% reduction) - Kept Thrift-specific logic: URL caching, expiration, refresh - Removed duplicate background task and error handling code - Made IDownloadResult protocol-agnostic - Replaced TSparkArrowResultLink with generic properties - Added FileUrl, StartRowOffset, RowCount, ByteCount, ExpirationTime, HttpHeaders - DownloadResult.FromThriftLink() factory method for backward compatibility - Enhanced CloudFetchDownloader to support custom HTTP headers - Respects IDownloadResult.HttpHeaders property - Enables REST API authentication without code changes Benefits: - Enables REST API (Statement Execution API) to reuse CloudFetch pipeline - No behavior changes to existing Thrift implementation - Better separation of concerns and testability Closes PECO-2788 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

The IDownloadResult interface was refactored to be protocol-agnostic, removing the Link property in favor of FileUrl. Updated the test to remove the obsolete mock setup that was causing build failures.

Visual Studio solution files (.sln) have a strict format requirement and must start with the Microsoft Visual Studio Solution File header. License headers break the file format and prevent Visual Studio from opening the solution. - Removed license header from Apache.Arrow.Adbc.Drivers.Databricks.sln - Added *.sln to .rat-excludes to prevent future license header additions

This commit improves PECO-2788 by making all CloudFetch components reusable across different protocols (Thrift, REST, future protocols): Key changes: - Created CloudFetchConfiguration class for unified configuration parsing - Refactored CloudFetchDownloadManager with protocol-agnostic constructor - Refactored CloudFetchReader to accept ICloudFetchDownloadManager interface - Added ChunkIndex to IDownloadResult for targeted URL refresh (REST API) - Implemented RefreshUrlsAsync() in BaseResultFetcher for expired link handling - Updated DatabricksCompositeReader to use new CloudFetch pattern - Maintained backward compatibility with [Obsolete] constructors Optimizations implemented: - Initial links optimization: Process links from initial response to save API calls - Expired link handling: Automatic URL refresh with configurable retry attempts Test results: 42/45 CloudFetch E2E tests passing (93% success rate) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Make BaseDatabricksReader and CloudFetchReader work with both Thrift and REST protocols by using ITracingStatement instead of IHiveServer2Statement and making IResponse nullable. Key changes: - BaseDatabricksReader: Use ITracingStatement and nullable IResponse - CloudFetchReader: Accept ITracingStatement and nullable IResponse - DatabricksReader: Cast to IHiveServer2Statement when accessing Thrift-specific properties (BatchSize, Connection) - BaseResultFetcher: Add Initialize() method for late resource injection - CloudFetchDownloadManager: Call Initialize() on BaseResultFetcher after creating shared resources - Design doc: Add section 2.4 documenting architectural changes This allows REST API (StatementExecutionStatement) to use the same CloudFetch components without code duplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Implements StatementExecutionResultFetcher to support the Statement Execution REST API's result fetching functionality. This is part of making the CloudFetch pipeline protocol-agnostic. Key features: - Extends BaseResultFetcher for common pipeline management - Supports manifest-based fetching (all links available upfront) - Supports incremental chunk fetching via GetResultChunkAsync() - Handles HTTP headers from ExternalLink for cloud storage authentication - Comprehensive unit tests covering both fetching patterns Related: PECO-2790 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…tialization Update StatementExecutionResultFetcher to use the late initialization pattern introduced in BaseResultFetcher, where CloudFetchDownloadManager creates shared resources and calls Initialize() to inject them. Changes: - Remove memoryManager and downloadQueue from constructor parameters - Pass null for both to base constructor, resources injected via Initialize() - Add RefreshUrlsAsync() implementation (returns empty for REST API) - Fix DownloadResult constructor to include chunkIndex parameter - Update all unit tests to remove obsolete constructor parameters - Update CreateFetcher helper to call Initialize() after construction This aligns StatementExecutionResultFetcher with the protocol-agnostic CloudFetch architecture where the manager controls resource lifecycle. All 17 unit tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

…dex chain Previously, StatementExecutionResultFetcher incorrectly used the manifest's chunk list to fetch all results upfront. This doesn't match the Databricks Statement Execution API design. Correct behavior: 1. Start with GetStatementResponse.Result field (first chunk data) 2. Follow next_chunk_index or next_chunk_internal_link to get subsequent chunks 3. Continue until no more chunks (next_chunk_index is null) Changes: - StatementExecutionResultFetcher now accepts full GetStatementResponse - FetchAllResultsAsync follows the next_chunk_index chain - Unit tests updated (14/17 passing, 3 need fixes for new behavior) Per Databricks API docs: https://docs.databricks.com/api/workspace/statementexecution/getstatement - result field contains the initial chunk data - next_chunk_index indicates there are more chunks - Use GetResultChunk to fetch subsequent chunks by index This fix ensures we correctly handle incremental chunk fetching for large result sets.

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

…ection Make StatementExecutionConnection a proper ADBC connection by inheriting from AdbcConnection base class. This establishes the correct architecture for the REST API implementation. Changes: - StatementExecutionConnection now inherits from AdbcConnection - Added StatementExecutionStatement stub class implementing AdbcStatement - CreateStatement() now returns a StatementExecutionStatement instance - Implemented required abstract methods (GetObjects, GetTableSchema, GetTableTypes) with NotImplemented exceptions (metadata operations to be added in future) - Statement execution methods throw NotImplementedException with note about PECO-2791-B implementation Testing: - Added 6 new unit tests verifying AdbcConnection implementation - Updated E2E test to verify CreateStatement returns a statement - All 42 tests pass (26 unit + 16 E2E) This is a minimal implementation (Option 1) that establishes the proper inheritance hierarchy while deferring full query execution to PECO-2791-B. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

Implements InlineReader class to handle INLINE disposition where results are embedded as Arrow IPC stream in REST API responses. This supports result sets ≤25 MiB that are returned inline in the Attachment field of ResultChunk. Key features: - Implements IArrowArrayStream interface for standard Arrow integration - Parses base64-encoded Arrow IPC stream data from ResultChunk.Attachment - Handles multiple chunks with proper ordering - Comprehensive unit tests (15 test cases) covering all scenarios This is part of PECO-2791-C (Inline Results Support) and completes PECO-2839. The integration with StatementExecutionStatement will be handled in a follow-up PR. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

…dex chain Previously, StatementExecutionResultFetcher incorrectly used the manifest's chunk list to fetch all results upfront. This doesn't match the Databricks Statement Execution API design. Correct behavior: 1. Start with GetStatementResponse.Result field (first chunk data) 2. Follow next_chunk_index or next_chunk_internal_link to get subsequent chunks 3. Continue until no more chunks (next_chunk_index is null) Changes: - StatementExecutionResultFetcher now accepts full GetStatementResponse - FetchAllResultsAsync follows the next_chunk_index chain - Unit tests updated (14/17 passing, 3 need fixes for new behavior) Per Databricks API docs: https://docs.databricks.com/api/workspace/statementexecution/getstatement - result field contains the initial chunk data - next_chunk_index indicates there are more chunks - Use GetResultChunk to fetch subsequent chunks by index This fix ensures we correctly handle incremental chunk fetching for large result sets.

CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.

Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)

…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Add protocol selection logic to enable choosing between Thrift and REST protocols. Users can now specify protocol via `adbc.databricks.protocol` parameter. Key changes: - DatabricksDatabase: Add factory pattern for protocol-based connection creation - CreateThriftConnection(): Existing Thrift/HiveServer2 path (default) - CreateRestConnection(): New Statement Execution API path - DatabricksConnection: Add CreateHttpClientForRestApi() static helper - Reuses authentication handlers (OAuth, token exchange, token refresh) - Reuses retry, tracing, and error handling infrastructure - Add comprehensive unit tests for protocol selection (8 tests) - Update design doc with implementation notes Protocol selection: - Default: "thrift" (backward compatible) - Options: "thrift" or "rest" (case-insensitive) - Invalid protocol throws ArgumentException Configuration parameters (already exist in DatabricksParameters): - adbc.databricks.protocol: "thrift" (default) or "rest" - Reuses existing Spark parameters for host, path, catalog, schema Tests: - 8/8 protocol selection tests passing - 26/26 existing connection tests passing (no regressions) - All framework targets build successfully (netstandard2.0, net472, net8.0) Related: PECO-2840 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

…operties This fixes a bug where properties passed via Connect() options weren't being found in StatementExecutionStatement because the merged dictionary was using case-sensitive comparison instead of case-insensitive. Without this fix, parameters like ResultDisposition weren't being applied correctly.

## 🥞 Stacked PR Use this [link](https://github.com/adbc-drivers/databricks/pull/14/files) to review incremental changes. - [**stack/PECO-2788-cloudfetch-protocol-agnostic**](#14) [[Files changed](https://github.com/adbc-drivers/databricks/pull/14/files)] - [stack/PECO-2790-statement-execution-result-fetcher](#15) [[Files changed](https://github.com/adbc-drivers/databricks/pull/15/files/2d7d1aecaa6ec5e3c101322fde7ce186ea6d02b8..318277c04119a86d4f3642f56eb60c0ab48ee0dc)] - [stack/PECO-2791-connection-statement-rest-execution](#16) [[Files changed](https://github.com/adbc-drivers/databricks/pull/16/files/318277c04119a86d4f3642f56eb60c0ab48ee0dc..66f7736845877b85df769ce53c93b8960c9eb3c4)] - [stack/PECO-2837-session-management](#17) [[Files changed](https://github.com/adbc-drivers/databricks/pull/17/files/66f7736845877b85df769ce53c93b8960c9eb3c4..d9ba98053d914f6f8d2195f4a4d37342d905f458)] - [stack/PECO-2839-inline-reader](#18) [[Files changed](https://github.com/adbc-drivers/databricks/pull/18/files/d9ba98053d914f6f8d2195f4a4d37342d905f458..f1edfb7e0ba46a8cb6793c8811a0834313cd7516)] - [stack/PECO-2840-protocol-selection](#19) [[Files changed](https://github.com/adbc-drivers/databricks/pull/19/files/f1edfb7e0ba46a8cb6793c8811a0834313cd7516..641f3ac3371d2fc4e16da5a6a0e6209aa41086be)] - [stack/PECO-2838-statement-execution](#20) [[Files changed](https://github.com/adbc-drivers/databricks/pull/20/files/641f3ac3371d2fc4e16da5a6a0e6209aa41086be..356693f61bbe940717a4debf352e61f59018473c)] - [stack/PECO-2792-feature-parity](#21) [[Files changed](https://github.com/adbc-drivers/databricks/pull/21/files/356693f61bbe940717a4debf352e61f59018473c..3cd98321863e76149c43b216a5822a8dcaf43eec)] - [stack/PECO-2849-extend-cloudfetch-e2e-tests](#22) [[Files changed](https://github.com/adbc-drivers/databricks/pull/22/files/3cd98321863e76149c43b216a5822a8dcaf43eec..815b17a5cb9438b4b8f0d6b4727bf37a4cceffa2)] --------- Make the CloudFetch pipeline protocol-agnostic to support both Thrift and REST API implementations, enabling code reuse and easier addition of new protocols in the future. Changes: - Created BaseResultFetcher abstract class extracting common fetching logic - Background task management, error handling, queue management - Template method pattern with FetchAllResultsAsync() for protocol-specific logic - Reduces code duplication by ~175 lines across implementations - Refactored CloudFetchResultFetcher to extend BaseResultFetcher - Reduced from 401 to 314 lines (22% reduction) - Kept Thrift-specific logic: URL caching, expiration, refresh - Removed duplicate background task and error handling code - Made IDownloadResult protocol-agnostic - Replaced TSparkArrowResultLink with generic properties - Added FileUrl, StartRowOffset, RowCount, ByteCount, ExpirationTime, HttpHeaders - DownloadResult.FromThriftLink() factory method for backward compatibility - Enhanced CloudFetchDownloader to support custom HTTP headers - Respects IDownloadResult.HttpHeaders property - Enables REST API authentication without code changes Benefits: - Enables REST API (Statement Execution API) to reuse CloudFetch pipeline - No behavior changes to existing Thrift implementation - Better separation of concerns and testability Co-authored-by: Jade Wang <[email protected]>

Jade Wang and others added 3 commits November 6, 2025 11:24

fix(test): remove obsolete IDownloadResult.Link mock setup

d4bf907

The IDownloadResult interface was refactored to be protocol-agnostic, removing the Link property in favor of FileUrl. Updated the test to remove the obsolete mock setup that was causing build failures.

jadewang-db force-pushed the stack/PECO-2840-protocol-selection branch from 7e7c3c0 to f61fc58 Compare November 6, 2025 22:25

jadewang-db mentioned this pull request Nov 6, 2025

add E2E test for SEA #22

Draft

Jade Wang and others added 2 commits November 7, 2025 09:27

jadewang-db force-pushed the stack/PECO-2840-protocol-selection branch 2 times, most recently from 05f2087 to 196ee24 Compare November 7, 2025 20:12

Jade Wang added 4 commits November 10, 2025 18:00

refactor

03e6de0

refactor

064dde9

modify the design doc

4abda04

address comments

e45211a

jadewang-db force-pushed the stack/PECO-2840-protocol-selection branch from 196ee24 to f229653 Compare November 12, 2025 01:59

Jade Wang and others added 9 commits November 13, 2025 11:15

change trace method

2d7d1ae

Jade Wang and others added 22 commits November 13, 2025 11:15

change design doc to split the work item

dc86270

add statement execution connection

cfc8ad7

jadewang-db force-pushed the stack/PECO-2840-protocol-selection branch from f229653 to 641f3ac Compare November 13, 2025 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(csharp): implement protocol selection for REST API support #19

feat(csharp): implement protocol selection for REST API support #19

Uh oh!

jadewang-db commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(csharp): implement protocol selection for REST API support #19

Are you sure you want to change the base?

feat(csharp): implement protocol selection for REST API support #19

Uh oh!

Conversation

jadewang-db commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jadewang-db commented Nov 4, 2025 •

edited

Loading