-
Notifications
You must be signed in to change notification settings - Fork 4
add E2E test for SEA #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jadewang-db
wants to merge
91
commits into
main
Choose a base branch
from
stack/PECO-2849-extend-cloudfetch-e2e-tests
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Make the CloudFetch pipeline protocol-agnostic to support both Thrift and REST API implementations, enabling code reuse and easier addition of new protocols in the future. Changes: - Created BaseResultFetcher abstract class extracting common fetching logic - Background task management, error handling, queue management - Template method pattern with FetchAllResultsAsync() for protocol-specific logic - Reduces code duplication by ~175 lines across implementations - Refactored CloudFetchResultFetcher to extend BaseResultFetcher - Reduced from 401 to 314 lines (22% reduction) - Kept Thrift-specific logic: URL caching, expiration, refresh - Removed duplicate background task and error handling code - Made IDownloadResult protocol-agnostic - Replaced TSparkArrowResultLink with generic properties - Added FileUrl, StartRowOffset, RowCount, ByteCount, ExpirationTime, HttpHeaders - DownloadResult.FromThriftLink() factory method for backward compatibility - Enhanced CloudFetchDownloader to support custom HTTP headers - Respects IDownloadResult.HttpHeaders property - Enables REST API authentication without code changes Benefits: - Enables REST API (Statement Execution API) to reuse CloudFetch pipeline - No behavior changes to existing Thrift implementation - Better separation of concerns and testability Closes PECO-2788 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The IDownloadResult interface was refactored to be protocol-agnostic, removing the Link property in favor of FileUrl. Updated the test to remove the obsolete mock setup that was causing build failures.
Visual Studio solution files (.sln) have a strict format requirement and must start with the Microsoft Visual Studio Solution File header. License headers break the file format and prevent Visual Studio from opening the solution. - Removed license header from Apache.Arrow.Adbc.Drivers.Databricks.sln - Added *.sln to .rat-excludes to prevent future license header additions
This was referenced Nov 6, 2025
b78a205 to
18c1943
Compare
This commit improves PECO-2788 by making all CloudFetch components reusable across different protocols (Thrift, REST, future protocols): Key changes: - Created CloudFetchConfiguration class for unified configuration parsing - Refactored CloudFetchDownloadManager with protocol-agnostic constructor - Refactored CloudFetchReader to accept ICloudFetchDownloadManager interface - Added ChunkIndex to IDownloadResult for targeted URL refresh (REST API) - Implemented RefreshUrlsAsync() in BaseResultFetcher for expired link handling - Updated DatabricksCompositeReader to use new CloudFetch pattern - Maintained backward compatibility with [Obsolete] constructors Optimizations implemented: - Initial links optimization: Process links from initial response to save API calls - Expired link handling: Automatic URL refresh with configurable retry attempts Test results: 42/45 CloudFetch E2E tests passing (93% success rate) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Make BaseDatabricksReader and CloudFetchReader work with both Thrift and REST protocols by using ITracingStatement instead of IHiveServer2Statement and making IResponse nullable. Key changes: - BaseDatabricksReader: Use ITracingStatement and nullable IResponse - CloudFetchReader: Accept ITracingStatement and nullable IResponse - DatabricksReader: Cast to IHiveServer2Statement when accessing Thrift-specific properties (BatchSize, Connection) - BaseResultFetcher: Add Initialize() method for late resource injection - CloudFetchDownloadManager: Call Initialize() on BaseResultFetcher after creating shared resources - Design doc: Add section 2.4 documenting architectural changes This allows REST API (StatementExecutionStatement) to use the same CloudFetch components without code duplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
dc903e7 to
fc80da0
Compare
fc80da0 to
bf1f712
Compare
Implements StatementExecutionResultFetcher to support the Statement Execution REST API's result fetching functionality. This is part of making the CloudFetch pipeline protocol-agnostic. Key features: - Extends BaseResultFetcher for common pipeline management - Supports manifest-based fetching (all links available upfront) - Supports incremental chunk fetching via GetResultChunkAsync() - Handles HTTP headers from ExternalLink for cloud storage authentication - Comprehensive unit tests covering both fetching patterns Related: PECO-2790 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tialization Update StatementExecutionResultFetcher to use the late initialization pattern introduced in BaseResultFetcher, where CloudFetchDownloadManager creates shared resources and calls Initialize() to inject them. Changes: - Remove memoryManager and downloadQueue from constructor parameters - Pass null for both to base constructor, resources injected via Initialize() - Add RefreshUrlsAsync() implementation (returns empty for REST API) - Fix DownloadResult constructor to include chunkIndex parameter - Update all unit tests to remove obsolete constructor parameters - Update CreateFetcher helper to call Initialize() after construction This aligns StatementExecutionResultFetcher with the protocol-agnostic CloudFetch architecture where the manager controls resource lifecycle. All 17 unit tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.
Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)
…dex chain Previously, StatementExecutionResultFetcher incorrectly used the manifest's chunk list to fetch all results upfront. This doesn't match the Databricks Statement Execution API design. Correct behavior: 1. Start with GetStatementResponse.Result field (first chunk data) 2. Follow next_chunk_index or next_chunk_internal_link to get subsequent chunks 3. Continue until no more chunks (next_chunk_index is null) Changes: - StatementExecutionResultFetcher now accepts full GetStatementResponse - FetchAllResultsAsync follows the next_chunk_index chain - Unit tests updated (14/17 passing, 3 need fixes for new behavior) Per Databricks API docs: https://docs.databricks.com/api/workspace/statementexecution/getstatement - result field contains the initial chunk data - next_chunk_index indicates there are more chunks - Use GetResultChunk to fetch subsequent chunks by index This fix ensures we correctly handle incremental chunk fetching for large result sets.
CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.
Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)
…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…sition support Implements the StatementExecutionStatement class for executing queries via the Databricks Statement Execution REST API with full support for both inline and external links result dispositions. Key features: - Query execution with async polling (default: 1000ms interval) - Query timeout and cancellation support - Hybrid disposition support: automatically detects and handles INLINE, EXTERNAL_LINKS, or INLINE_OR_EXTERNAL_LINKS - Full ITracingStatement interface implementation - Statement properties: CatalogName, SchemaName, MaxRows, QueryTimeoutSeconds - Schema conversion from REST API format to Arrow types - CloudFetch pipeline integration via SimpleCloudFetchReader - Proper resource cleanup via Dispose pattern Implementation details: - Created SimpleCloudFetchReader as nested class for protocol-agnostic CloudFetch reading - Made CloudFetchDownloadManager statement parameter nullable to support REST API - Updated StatementExecutionConnection to pass HttpClient and properties to statement - Updated DatabricksDatabase to pass HttpClient when creating connections - Basic type mapping with TODOs for comprehensive type conversion - LZ4 decompression placeholder for future implementation Related work: - PECO-2838: StatementExecutionStatement (Basic Execution with External Links) - Depends on: PECO-2840 (Protocol Selection), PECO-2839 (InlineReader) Files modified: - StatementExecution/StatementExecutionStatement.cs (complete implementation ~660 lines) - StatementExecution/StatementExecutionConnection.cs (add HttpClient parameter) - Reader/CloudFetch/CloudFetchDownloadManager.cs (make statement nullable) - DatabricksDatabase.cs (pass HttpClient to connection) - doc/statement-execution-api-design.md (update task status) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…expiration - Added RefreshUrlsAsync implementation in StatementExecutionResultFetcher - REST API URLs expire after ~1 hour and need refresh via GetResultChunkAsync - Fixed DownloadResult constructor calls to include chunkIndex parameter - Fixed CloudFetchDownloadManager test constructor usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tic CloudFetch pattern Update CreateExternalLinksReader to use the protocol-agnostic CloudFetch pattern: - Use StatementExecutionResultFetcher with 3-parameter constructor (no manager resources) - Use CloudFetchConfiguration.FromProperties to parse config - Use CloudFetchDownloadManager with config (calls Initialize() on fetcher) - Use CloudFetchReader with ITracingStatement and nullable IResponse Also fix StatementExecutionResultFetcher constructor that was reverted during conflict resolution: - Remove memoryManager and downloadQueue parameters - Pass null to base constructor for late initialization - Resources will be injected via Initialize() by CloudFetchDownloadManager This completes the CloudFetch REST API integration using the unified protocol-agnostic pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…mplementation for Statement Execution API Adds comprehensive support for Statement Execution API when warehouses don't support Arrow format inline results, and implements full metadata operations. **JSON_ARRAY Format Support:** - Created JsonArrayReader to convert JSON data to Arrow format - Supports multiple Arrow types: Int32, Int64, Double, String, Boolean, Date32, Timestamp - Handles null values and type conversion with proper error handling - Automatically detects format from response manifest **GetObjects Metadata Implementation:** - Implemented proper nested structure builders for hierarchical catalog/schema/table/column data - Added BuildListArray() to manually construct ListArrays with struct types - Added helper methods: BuildDbSchemasStruct(), BuildTablesStruct(), BuildColumnsStruct() - Includes CreateEmptyArray() for empty array creation across various types - Returns complete metadata per ADBC standard schema **E2E Test Infrastructure Improvements:** - Rewrote StatementExecutionConnectionE2ETests to use proper driver flow (Driver → Database → Connection) - Removed complex mocking infrastructure (527 lines → 168 lines) - Added DatabricksTestHelpers.GetPropertiesWithStatementExecutionEnabled for REST API config - Created StatementExecutionFeatureParityTests with 8 comprehensive E2E tests **Authentication & Connection Fixes:** - Added Bearer token authentication support in DatabricksConnection - Fixed null reference in TracingDelegatingHandler for W3C trace context - Always send warehouse_id in requests (required by Databricks API) **Test Results (all passing):** ✓ GetTableTypes_ReturnsStandardTypes ✓ GetObjects_CatalogDepth_ReturnsCatalogs ✓ GetTableSchema_KnownTable_ReturnsSchema ✓ ExecuteQuery_InlineResults_ReturnsData ✓ ExecuteQuery_InlineResults_MultipleRows_ReturnsAllData ✓ ExecuteQuery_HybridDisposition_ReturnsData ✓ ExecuteQuery_ExternalLinks_LargeResult_UsesCloudFetch ✓ ExecuteUpdate_DDLStatements_WorksCorrectly
CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.
Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)
…dex chain Previously, StatementExecutionResultFetcher incorrectly used the manifest's chunk list to fetch all results upfront. This doesn't match the Databricks Statement Execution API design. Correct behavior: 1. Start with GetStatementResponse.Result field (first chunk data) 2. Follow next_chunk_index or next_chunk_internal_link to get subsequent chunks 3. Continue until no more chunks (next_chunk_index is null) Changes: - StatementExecutionResultFetcher now accepts full GetStatementResponse - FetchAllResultsAsync follows the next_chunk_index chain - Unit tests updated (14/17 passing, 3 need fixes for new behavior) Per Databricks API docs: https://docs.databricks.com/api/workspace/statementexecution/getstatement - result field contains the initial chunk data - next_chunk_index indicates there are more chunks - Use GetResultChunk to fetch subsequent chunks by index This fix ensures we correctly handle incremental chunk fetching for large result sets.
CloudFetchDownloadManager was attempting to set HttpClient.Timeout after the HttpClient had already been used to make requests. This caused an InvalidOperationException. The fix wraps the timeout setting in a try-catch block to handle reused HttpClient instances gracefully. If the HttpClient has already been used, the timeout cannot be changed, but this is acceptable as the HttpClient may have been configured with an appropriate timeout already. Fixes the issue where StatementExecutionStatement reuses _httpClient for both ExecuteStatement and CloudFetch downloads.
Instead of reusing the API client's HttpClient (which may have already been used), create a dedicated HttpClient for CloudFetch downloads in StatementExecutionStatement. Benefits: - Allows CloudFetch-specific timeout configuration without affecting API calls - Cleaner separation of concerns - Avoids HttpClient property modification after first use - Each concern (API calls vs file downloads) has its own HttpClient Changes: - Added _cloudFetchHttpClient field to StatementExecutionStatement - Create it lazily in CreateExternalLinksReader when needed - Dispose it properly in Dispose method - Reverted try-catch blocks in CloudFetchDownloadManager (no longer needed)
…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…sition support Implements the StatementExecutionStatement class for executing queries via the Databricks Statement Execution REST API with full support for both inline and external links result dispositions. Key features: - Query execution with async polling (default: 1000ms interval) - Query timeout and cancellation support - Hybrid disposition support: automatically detects and handles INLINE, EXTERNAL_LINKS, or INLINE_OR_EXTERNAL_LINKS - Full ITracingStatement interface implementation - Statement properties: CatalogName, SchemaName, MaxRows, QueryTimeoutSeconds - Schema conversion from REST API format to Arrow types - CloudFetch pipeline integration via SimpleCloudFetchReader - Proper resource cleanup via Dispose pattern Implementation details: - Created SimpleCloudFetchReader as nested class for protocol-agnostic CloudFetch reading - Made CloudFetchDownloadManager statement parameter nullable to support REST API - Updated StatementExecutionConnection to pass HttpClient and properties to statement - Updated DatabricksDatabase to pass HttpClient when creating connections - Basic type mapping with TODOs for comprehensive type conversion - LZ4 decompression placeholder for future implementation Related work: - PECO-2838: StatementExecutionStatement (Basic Execution with External Links) - Depends on: PECO-2840 (Protocol Selection), PECO-2839 (InlineReader) Files modified: - StatementExecution/StatementExecutionStatement.cs (complete implementation ~660 lines) - StatementExecution/StatementExecutionConnection.cs (add HttpClient parameter) - Reader/CloudFetch/CloudFetchDownloadManager.cs (make statement nullable) - DatabricksDatabase.cs (pass HttpClient to connection) - doc/statement-execution-api-design.md (update task status) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…expiration - Added RefreshUrlsAsync implementation in StatementExecutionResultFetcher - REST API URLs expire after ~1 hour and need refresh via GetResultChunkAsync - Fixed DownloadResult constructor calls to include chunkIndex parameter - Fixed CloudFetchDownloadManager test constructor usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tic CloudFetch pattern Update CreateExternalLinksReader to use the protocol-agnostic CloudFetch pattern: - Use StatementExecutionResultFetcher with 3-parameter constructor (no manager resources) - Use CloudFetchConfiguration.FromProperties to parse config - Use CloudFetchDownloadManager with config (calls Initialize() on fetcher) - Use CloudFetchReader with ITracingStatement and nullable IResponse Also fix StatementExecutionResultFetcher constructor that was reverted during conflict resolution: - Remove memoryManager and downloadQueue parameters - Pass null to base constructor for late initialization - Resources will be injected via Initialize() by CloudFetchDownloadManager This completes the CloudFetch REST API integration using the unified protocol-agnostic pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…sition support Implements the StatementExecutionStatement class for executing queries via the Databricks Statement Execution REST API with full support for both inline and external links result dispositions. Key features: - Query execution with async polling (default: 1000ms interval) - Query timeout and cancellation support - Hybrid disposition support: automatically detects and handles INLINE, EXTERNAL_LINKS, or INLINE_OR_EXTERNAL_LINKS - Full ITracingStatement interface implementation - Statement properties: CatalogName, SchemaName, MaxRows, QueryTimeoutSeconds - Schema conversion from REST API format to Arrow types - CloudFetch pipeline integration via SimpleCloudFetchReader - Proper resource cleanup via Dispose pattern Implementation details: - Created SimpleCloudFetchReader as nested class for protocol-agnostic CloudFetch reading - Made CloudFetchDownloadManager statement parameter nullable to support REST API - Updated StatementExecutionConnection to pass HttpClient and properties to statement - Updated DatabricksDatabase to pass HttpClient when creating connections - Basic type mapping with TODOs for comprehensive type conversion - LZ4 decompression placeholder for future implementation Related work: - PECO-2838: StatementExecutionStatement (Basic Execution with External Links) - Depends on: PECO-2840 (Protocol Selection), PECO-2839 (InlineReader) Files modified: - StatementExecution/StatementExecutionStatement.cs (complete implementation ~660 lines) - StatementExecution/StatementExecutionConnection.cs (add HttpClient parameter) - Reader/CloudFetch/CloudFetchDownloadManager.cs (make statement nullable) - DatabricksDatabase.cs (pass HttpClient to connection) - doc/statement-execution-api-design.md (update task status) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…expiration - Added RefreshUrlsAsync implementation in StatementExecutionResultFetcher - REST API URLs expire after ~1 hour and need refresh via GetResultChunkAsync - Fixed DownloadResult constructor calls to include chunkIndex parameter - Fixed CloudFetchDownloadManager test constructor usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tic CloudFetch pattern Update CreateExternalLinksReader to use the protocol-agnostic CloudFetch pattern: - Use StatementExecutionResultFetcher with 3-parameter constructor (no manager resources) - Use CloudFetchConfiguration.FromProperties to parse config - Use CloudFetchDownloadManager with config (calls Initialize() on fetcher) - Use CloudFetchReader with ITracingStatement and nullable IResponse Also fix StatementExecutionResultFetcher constructor that was reverted during conflict resolution: - Remove memoryManager and downloadQueue parameters - Pass null to base constructor for late initialization - Resources will be injected via Initialize() by CloudFetchDownloadManager This completes the CloudFetch REST API integration using the unified protocol-agnostic pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…mplementation for Statement Execution API Adds comprehensive support for Statement Execution API when warehouses don't support Arrow format inline results, and implements full metadata operations. **JSON_ARRAY Format Support:** - Created JsonArrayReader to convert JSON data to Arrow format - Supports multiple Arrow types: Int32, Int64, Double, String, Boolean, Date32, Timestamp - Handles null values and type conversion with proper error handling - Automatically detects format from response manifest **GetObjects Metadata Implementation:** - Implemented proper nested structure builders for hierarchical catalog/schema/table/column data - Added BuildListArray() to manually construct ListArrays with struct types - Added helper methods: BuildDbSchemasStruct(), BuildTablesStruct(), BuildColumnsStruct() - Includes CreateEmptyArray() for empty array creation across various types - Returns complete metadata per ADBC standard schema **E2E Test Infrastructure Improvements:** - Rewrote StatementExecutionConnectionE2ETests to use proper driver flow (Driver → Database → Connection) - Removed complex mocking infrastructure (527 lines → 168 lines) - Added DatabricksTestHelpers.GetPropertiesWithStatementExecutionEnabled for REST API config - Created StatementExecutionFeatureParityTests with 8 comprehensive E2E tests **Authentication & Connection Fixes:** - Added Bearer token authentication support in DatabricksConnection - Fixed null reference in TracingDelegatingHandler for W3C trace context - Always send warehouse_id in requests (required by Databricks API) **Test Results (all passing):** ✓ GetTableTypes_ReturnsStandardTypes ✓ GetObjects_CatalogDepth_ReturnsCatalogs ✓ GetTableSchema_KnownTable_ReturnsSchema ✓ ExecuteQuery_InlineResults_ReturnsData ✓ ExecuteQuery_InlineResults_MultipleRows_ReturnsAllData ✓ ExecuteQuery_HybridDisposition_ReturnsData ✓ ExecuteQuery_ExternalLinks_LargeResult_UsesCloudFetch ✓ ExecuteUpdate_DDLStatements_WorksCorrectly
…upport Add StatementExecutionStatement class that uses the Statement Execution REST API and integrates with the protocol-agnostic CloudFetch pipeline. Key features: - Query execution via ExecuteStatementAsync with polling - Support for EXTERNAL_LINKS disposition using CloudFetch - Support for INLINE disposition with InlineReader - Uses StatementExecutionResultFetcher with late initialization pattern - Implements ITracingStatement for Activity tracing - Configurable polling interval, timeouts, and result formats The CreateExternalLinksReader method follows the protocol-agnostic pattern: 1. Create StatementExecutionResultFetcher (no manager resources needed) 2. Parse CloudFetchConfiguration from properties 3. Create CloudFetchDownloadManager (calls Initialize() on fetcher) 4. Start the manager 5. Create CloudFetchReader with ITracingStatement 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…sition support Implements the StatementExecutionStatement class for executing queries via the Databricks Statement Execution REST API with full support for both inline and external links result dispositions. Key features: - Query execution with async polling (default: 1000ms interval) - Query timeout and cancellation support - Hybrid disposition support: automatically detects and handles INLINE, EXTERNAL_LINKS, or INLINE_OR_EXTERNAL_LINKS - Full ITracingStatement interface implementation - Statement properties: CatalogName, SchemaName, MaxRows, QueryTimeoutSeconds - Schema conversion from REST API format to Arrow types - CloudFetch pipeline integration via SimpleCloudFetchReader - Proper resource cleanup via Dispose pattern Implementation details: - Created SimpleCloudFetchReader as nested class for protocol-agnostic CloudFetch reading - Made CloudFetchDownloadManager statement parameter nullable to support REST API - Updated StatementExecutionConnection to pass HttpClient and properties to statement - Updated DatabricksDatabase to pass HttpClient when creating connections - Basic type mapping with TODOs for comprehensive type conversion - LZ4 decompression placeholder for future implementation Related work: - PECO-2838: StatementExecutionStatement (Basic Execution with External Links) - Depends on: PECO-2840 (Protocol Selection), PECO-2839 (InlineReader) Files modified: - StatementExecution/StatementExecutionStatement.cs (complete implementation ~660 lines) - StatementExecution/StatementExecutionConnection.cs (add HttpClient parameter) - Reader/CloudFetch/CloudFetchDownloadManager.cs (make statement nullable) - DatabricksDatabase.cs (pass HttpClient to connection) - doc/statement-execution-api-design.md (update task status) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…expiration - Added RefreshUrlsAsync implementation in StatementExecutionResultFetcher - REST API URLs expire after ~1 hour and need refresh via GetResultChunkAsync - Fixed DownloadResult constructor calls to include chunkIndex parameter - Fixed CloudFetchDownloadManager test constructor usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…tic CloudFetch pattern Update CreateExternalLinksReader to use the protocol-agnostic CloudFetch pattern: - Use StatementExecutionResultFetcher with 3-parameter constructor (no manager resources) - Use CloudFetchConfiguration.FromProperties to parse config - Use CloudFetchDownloadManager with config (calls Initialize() on fetcher) - Use CloudFetchReader with ITracingStatement and nullable IResponse Also fix StatementExecutionResultFetcher constructor that was reverted during conflict resolution: - Remove memoryManager and downloadQueue parameters - Pass null to base constructor for late initialization - Resources will be injected via Initialize() by CloudFetchDownloadManager This completes the CloudFetch REST API integration using the unified protocol-agnostic pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
…mplementation for Statement Execution API Adds comprehensive support for Statement Execution API when warehouses don't support Arrow format inline results, and implements full metadata operations. **JSON_ARRAY Format Support:** - Created JsonArrayReader to convert JSON data to Arrow format - Supports multiple Arrow types: Int32, Int64, Double, String, Boolean, Date32, Timestamp - Handles null values and type conversion with proper error handling - Automatically detects format from response manifest **GetObjects Metadata Implementation:** - Implemented proper nested structure builders for hierarchical catalog/schema/table/column data - Added BuildListArray() to manually construct ListArrays with struct types - Added helper methods: BuildDbSchemasStruct(), BuildTablesStruct(), BuildColumnsStruct() - Includes CreateEmptyArray() for empty array creation across various types - Returns complete metadata per ADBC standard schema **E2E Test Infrastructure Improvements:** - Rewrote StatementExecutionConnectionE2ETests to use proper driver flow (Driver → Database → Connection) - Removed complex mocking infrastructure (527 lines → 168 lines) - Added DatabricksTestHelpers.GetPropertiesWithStatementExecutionEnabled for REST API config - Created StatementExecutionFeatureParityTests with 8 comprehensive E2E tests **Authentication & Connection Fixes:** - Added Bearer token authentication support in DatabricksConnection - Fixed null reference in TracingDelegatingHandler for W3C trace context - Always send warehouse_id in requests (required by Databricks API) **Test Results (all passing):** ✓ GetTableTypes_ReturnsStandardTypes ✓ GetObjects_CatalogDepth_ReturnsCatalogs ✓ GetTableSchema_KnownTable_ReturnsSchema ✓ ExecuteQuery_InlineResults_ReturnsData ✓ ExecuteQuery_InlineResults_MultipleRows_ReturnsAllData ✓ ExecuteQuery_HybridDisposition_ReturnsData ✓ ExecuteQuery_ExternalLinks_LargeResult_UsesCloudFetch ✓ ExecuteUpdate_DDLStatements_WorksCorrectly
- Remove merge conflict markers - Remove duplicate constructors - Clean up interleaved code from manifest-based and Result-based implementations - Keep correct implementation using GetStatementResponse.Result and next_chunk_index chain
- Remove 145+ merge conflict markers that were accidentally committed - Restore clean version from stack/PECO-2790-statement-execution-result-fetcher branch - Fixes corruption introduced during git stack sync
bf1f712 to
815b17a
Compare
jadewang-db
added a commit
that referenced
this pull request
Dec 1, 2025
## 🥞 Stacked PR Use this [link](https://github.com/adbc-drivers/databricks/pull/14/files) to review incremental changes. - [**stack/PECO-2788-cloudfetch-protocol-agnostic**](#14) [[Files changed](https://github.com/adbc-drivers/databricks/pull/14/files)] - [stack/PECO-2790-statement-execution-result-fetcher](#15) [[Files changed](https://github.com/adbc-drivers/databricks/pull/15/files/2d7d1aecaa6ec5e3c101322fde7ce186ea6d02b8..318277c04119a86d4f3642f56eb60c0ab48ee0dc)] - [stack/PECO-2791-connection-statement-rest-execution](#16) [[Files changed](https://github.com/adbc-drivers/databricks/pull/16/files/318277c04119a86d4f3642f56eb60c0ab48ee0dc..66f7736845877b85df769ce53c93b8960c9eb3c4)] - [stack/PECO-2837-session-management](#17) [[Files changed](https://github.com/adbc-drivers/databricks/pull/17/files/66f7736845877b85df769ce53c93b8960c9eb3c4..d9ba98053d914f6f8d2195f4a4d37342d905f458)] - [stack/PECO-2839-inline-reader](#18) [[Files changed](https://github.com/adbc-drivers/databricks/pull/18/files/d9ba98053d914f6f8d2195f4a4d37342d905f458..f1edfb7e0ba46a8cb6793c8811a0834313cd7516)] - [stack/PECO-2840-protocol-selection](#19) [[Files changed](https://github.com/adbc-drivers/databricks/pull/19/files/f1edfb7e0ba46a8cb6793c8811a0834313cd7516..641f3ac3371d2fc4e16da5a6a0e6209aa41086be)] - [stack/PECO-2838-statement-execution](#20) [[Files changed](https://github.com/adbc-drivers/databricks/pull/20/files/641f3ac3371d2fc4e16da5a6a0e6209aa41086be..356693f61bbe940717a4debf352e61f59018473c)] - [stack/PECO-2792-feature-parity](#21) [[Files changed](https://github.com/adbc-drivers/databricks/pull/21/files/356693f61bbe940717a4debf352e61f59018473c..3cd98321863e76149c43b216a5822a8dcaf43eec)] - [stack/PECO-2849-extend-cloudfetch-e2e-tests](#22) [[Files changed](https://github.com/adbc-drivers/databricks/pull/22/files/3cd98321863e76149c43b216a5822a8dcaf43eec..815b17a5cb9438b4b8f0d6b4727bf37a4cceffa2)] --------- Make the CloudFetch pipeline protocol-agnostic to support both Thrift and REST API implementations, enabling code reuse and easier addition of new protocols in the future. Changes: - Created BaseResultFetcher abstract class extracting common fetching logic - Background task management, error handling, queue management - Template method pattern with FetchAllResultsAsync() for protocol-specific logic - Reduces code duplication by ~175 lines across implementations - Refactored CloudFetchResultFetcher to extend BaseResultFetcher - Reduced from 401 to 314 lines (22% reduction) - Kept Thrift-specific logic: URL caching, expiration, refresh - Removed duplicate background task and error handling code - Made IDownloadResult protocol-agnostic - Replaced TSparkArrowResultLink with generic properties - Added FileUrl, StartRowOffset, RowCount, ByteCount, ExpirationTime, HttpHeaders - DownloadResult.FromThriftLink() factory method for backward compatibility - Enhanced CloudFetchDownloader to support custom HTTP headers - Respects IDownloadResult.HttpHeaders property - Enables REST API authentication without code changes Benefits: - Enables REST API (Statement Execution API) to reuse CloudFetch pipeline - No behavior changes to existing Thrift implementation - Better separation of concerns and testability Co-authored-by: Jade Wang <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review incremental changes.
What's Changed
Please fill in a description of the changes here.
This contains breaking changes.
Closes #NNN.