Skip to content

Conversation

@jadewang-db
Copy link
Collaborator

@jadewang-db jadewang-db commented Nov 4, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Make the CloudFetch pipeline protocol-agnostic to support both Thrift and REST API implementations, enabling code reuse and easier addition of new protocols in the future.

Changes:

  • Created BaseResultFetcher abstract class extracting common fetching logic

    • Background task management, error handling, queue management
    • Template method pattern with FetchAllResultsAsync() for protocol-specific logic
    • Reduces code duplication by ~175 lines across implementations
  • Refactored CloudFetchResultFetcher to extend BaseResultFetcher

    • Reduced from 401 to 314 lines (22% reduction)
    • Kept Thrift-specific logic: URL caching, expiration, refresh
    • Removed duplicate background task and error handling code
  • Made IDownloadResult protocol-agnostic

    • Replaced TSparkArrowResultLink with generic properties
    • Added FileUrl, StartRowOffset, RowCount, ByteCount, ExpirationTime, HttpHeaders
    • DownloadResult.FromThriftLink() factory method for backward compatibility
  • Enhanced CloudFetchDownloader to support custom HTTP headers

    • Respects IDownloadResult.HttpHeaders property
    • Enables REST API authentication without code changes

Benefits:

  • Enables REST API (Statement Execution API) to reuse CloudFetch pipeline
  • No behavior changes to existing Thrift implementation
  • Better separation of concerns and testability

var request = new ExecuteStatementRequest
{
Statement = sqlQuery,
WaitTimeout = enableDirect ? null : "10s" // ← null means wait until complete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not seeing null as a valid WaitTimeout? https://docs.databricks.com/api/workspace/statementexecution/executestatement

Also this should probably map to disposition? https://docs.databricks.com/api/workspace/statementexecution/executestatement#disposition
In the SEA wiki it says if we assign inline but result is larger than 25MB it will fail. And for EXTERNAL_LINK there is also an upper limit of 100GB.
I think in order for SEA to work like thrift, we may need a 3rd option like AUTO, which just like thrift directResult capability, it will return either inline result or cloudfetch links automatically depending on the result data size.

Copy link
Collaborator

@eric-wang-1990 eric-wang-1990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall it looks good to me, can we run this against all the tests to make sure no regression is introduced?
Also it's better if we can hold off this until we migrate all the existing PRs that is include in the arrow-adbc branch but not this driver branch, there are a handful of them like ~5ish

// Default values
private const int DefaultParallelDownloads = 3;
private const int DefaultPrefetchCount = 2;
private const int DefaultMemoryBufferSizeMB = 200;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was changed in one of existing PR: #45

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int DefaultMemoryBufferSizeMB = 100; need change it to 100

@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 2d7d1ae to 53a4273 Compare November 24, 2025 19:41
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 53a4273 to 3c3ebde Compare November 24, 2025 19:44
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 3c3ebde to 1416c44 Compare November 24, 2025 19:44
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 1416c44 to 8a178e4 Compare November 24, 2025 19:47
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 8a178e4 to 08d12ed Compare November 25, 2025 01:00
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 08d12ed to 0d65328 Compare November 25, 2025 01:10
// Default values
private const int DefaultParallelDownloads = 3;
private const int DefaultPrefetchCount = 2;
private const int DefaultMemoryBufferSizeMB = 200;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int DefaultMemoryBufferSizeMB = 100; need change it to 100

@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 0d65328 to b5001fd Compare November 25, 2025 01:15
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from b5001fd to 7dd0465 Compare November 25, 2025 17:42
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 7dd0465 to 73ed0ec Compare November 25, 2025 18:29

// IActivityTracer implementation - delegates to statement
ActivityTrace IActivityTracer.Trace => _statement.Trace;
// IActivityTracer implementation - delegates to statement (or returns lazy-initialized fallback if null)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work? Is this for SEA?
For SEA does the activityTrace propogate successfully? If we need a follow up item can we add a TODO

@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 73ed0ec to 0fed090 Compare November 26, 2025 17:05
squash and merge

refactor
@jadewang-db jadewang-db force-pushed the stack/PECO-2788-cloudfetch-protocol-agnostic branch from 0fed090 to 2900917 Compare November 26, 2025 18:37
@jadewang-db jadewang-db merged commit 0039ee3 into main Dec 1, 2025
8 checks passed
@jadewang-db jadewang-db deleted the stack/PECO-2788-cloudfetch-protocol-agnostic branch December 1, 2025 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants