fix(rust): refetch presigned URL on HTTP 401/403/404 from cloud storage by eric-wang-1990 · Pull Request #249 · adbc-drivers/databricks

eric-wang-1990 · 2026-02-25T23:29:59Z

Summary

Fixes a bug in StreamingCloudFetchProvider::download_chunk_with_retry() where HTTP 401/403/404 responses from cloud storage would cause the driver to retry indefinitely with the same invalid presigned URL.

Root cause: The function only called refetch_link() when a URL was expired by timestamp check (link.is_expired()). When cloud storage returned actual 401/403/404 errors (URL revoked or expired before timestamp), the cached link was never cleared, so retries kept using the same bad URL.

Fix: On HTTP 401/403/404 in the error handler, set entry.link = None before sleeping. The existing logic at the top of the retry loop already handles None links by calling refetch_link(), so no further changes are needed.

Changes:

rust/src/reader/cloudfetch/streaming_provider.rs: Add is_cloud_storage_auth_error() helper and clear cached link on 401/403/404
rust/spec/cloudfetch-implementation-plan.md: Document the 401/403/404 refetch behavior

Fixes: PECO-2918

Test plan

cargo test passes (117 tests, +4 new unit tests for is_cloud_storage_auth_error)
cargo clippy -- -D warnings clean
cargo fmt applied
New unit tests cover HTTP 401, 403, 404 detection and non-matching statuses (500, 429, network errors)

🤖 Generated with Claude Code

- Rename adbc-pr-test → adbc-csharp-pr-test to match test repo - Add adbc-rust-pr-test dispatch when rust/ files change - For PRs: detect changed paths via GitHub API, dispatch only relevant tests - For merge queue: output separate csharp-changed/rust-changed flags, dispatch only relevant tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ange - Workflow file changes now trigger both C# and Rust tests - Changes outside csharp/, rust/, .github/workflows/ default to all targets - Remove auto-approve-no-relevant-changes (can never fire with always-true logic) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…l tests - Changes outside csharp/, rust/, .github/workflows/ now auto-approve - Only workflow changes trigger all targets; otherwise per-folder targeting - Restore auto-approve-no-relevant-changes for merge queue Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… to sub-workflows - Create a single pending "Integration Tests" check before dispatching - Pass check_run_id to both C# and Rust sub-workflow payloads - Auto-pass immediately if no relevant driver files changed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rate checks - Remove umbrella check approach; keep each driver's check independent - Pass checks for non-dispatched drivers instead of creating an umbrella - check-previous-run now validates both checks passed before auto-approving - auto-approve jobs create both checks for no-changes and already-passed cases Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…o feat/trigger-rust-integration-tests

During CloudFetch chunk downloads, cloud storage can return HTTP 401/403/404 when a presigned URL is revoked or expires before its timestamp. Previously, download_chunk_with_retry() would retry the same invalid URL indefinitely. Now, when a 401/403/404 is detected, the cached link is cleared (entry.link = None) so the next retry iteration calls refetch_link() to obtain a fresh presigned URL from the Databricks SEA API. Fixes: PECO-2918 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-02-25T23:37:06Z

🚀 Integration tests triggered! View workflow run

The `entry` DashMap read guard was declared before the `match stored_link` block, keeping its shard read lock alive across both `refetch_link().await` and `chunks.get_mut()`. When `stored_link` is None/expired (i.e. after the 401/403/404 handler clears `entry.link`), the task would: 1. Suspend at `refetch_link().await` while still holding the shard read lock 2. Resume and call `chunks.get_mut()` on the same shard 3. Block forever waiting for the exclusive write lock — which it can never acquire because it holds the read lock on the same shard Fix: extract `stored_link` in a nested block so the read guard (`entry`) is dropped before the match, `refetch_link().await`, and `get_mut()`. No logic change — just scope narrowing to avoid the self-deadlock. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-02-26T01:24:16Z

🔒 Integration test approval reset

New commits were pushed to this PR. The integration-test label has been automatically removed for security.

A maintainer must re-review the changes and re-add the label to trigger tests again.

Why is this necessary?

New code requires fresh security review
Prevents approved PRs from adding malicious code later
Ensures all tested code has been explicitly approved

Latest commit: 9205781

If Databricks repeatedly returns an already-expired presigned URL (clock skew, server bug), the is_expired() refetch path would loop indefinitely. Add a refresh_attempts counter capped at max_retries, mirroring C# CloudFetchDownloader's maxUrlRefreshAttempts bound. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Let me write my final summary:

PR Overview

This PR fixes a bug in the CloudFetch download retry logic where HTTP 401/403/404 responses from cloud storage would cause indefinite retries with the same invalid presigned URL. The fix adds explicit detection of auth errors and triggers immediate URL refetch from Databricks.

Changes:

Added is_cloud_storage_auth_error() helper to detect HTTP 401/403/404 status codes in error messages
Refactored download_chunk_with_retry() to check for auth errors and refetch URLs immediately without sleeping
Updated documentation to describe the 401/403/404 refetch behavior
Added unit tests for the error detection helper

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
rust/src/reader/cloudfetch/streaming_provider.rs	Refactored retry logic to detect and handle auth errors (401/403/404) by refetching presigned URLs; added helper function and unit tests
rust/spec/cloudfetch-implementation-plan.md	Documented the new behavior of refetching URLs on 401/403/404 errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

rust/src/reader/cloudfetch/streaming_provider.rs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T03:27:25Z

rust/src/reader/cloudfetch/streaming_provider.rs

                    if let Some(mut entry) = chunks.get_mut(&chunk_index) {
                        entry.state = ChunkState::DownloadRetry;
+
+                        // HTTP 403/404 from cloud storage means the presigned URL was


The comment mentions "HTTP 403/404" but the actual implementation also handles HTTP 401 (line 493). Update the comment to include 401 for accuracy: "HTTP 401/403/404 from cloud storage means..."

Suggested change

// HTTP 403/404 from cloud storage means the presigned URL was

// HTTP 401/403/404 from cloud storage means the presigned URL was

✅ Fixed — updated the comment to say "HTTP 401/403/404" to match the actual implementation.

vikrantpuppala · 2026-02-26T06:43:28Z

rust/src/reader/cloudfetch/streaming_provider.rs

+    /// so we should refetch a fresh URL from Databricks rather than retrying
+    /// the same invalid URL.
+    fn is_cloud_storage_auth_error(err_str: &str) -> bool {
+        err_str.contains("HTTP 401") || err_str.contains("HTTP 403") || err_str.contains("HTTP 404")


@copilot is this the best to match http errors? i.e. is string matching required, is the code not available directly?

@eric-wang-1990 -> this change might be better: #254 (you can merge into this branch)

This is merged

Copilot · 2026-02-26T06:43:36Z

@vikrantpuppala I've opened a new pull request, #254, to work on those changes. Once the pull request is ready, I'll request review from you.

…ing for presigned URL error detection (#254) ## What's Changed `is_cloud_storage_auth_error` was detecting HTTP 401/403/404 via string matching on the error message (`err_str.contains("HTTP 401")`). The `driverbase::error::Error` type has a `vendor_code` field purpose-built for structured codes — use it instead. - **`client/http.rs`**: Attach `.vendor_code(status.as_u16() as i32)` to non-retryable HTTP errors so the status code is available structurally on the error - **`streaming_provider.rs`**: Replace `is_cloud_storage_auth_error(err_str: &str)` with `is_cloud_storage_auth_error(e: &Error)`, using `matches!(e.get_vendor_code(), 401 | 403 | 404)` — no string parsing ```rust // Before fn is_cloud_storage_auth_error(err_str: &str) -> bool { err_str.contains("HTTP 401") || err_str.contains("HTTP 403") || err_str.contains("HTTP 404") } // After fn is_cloud_storage_auth_error(e: &Error) -> bool { matches!(e.get_vendor_code(), 401 | 403 | 404) } ```  --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: vikrantpuppala <21273801+vikrantpuppala@users.noreply.github.com>

eric-wang-1990 · 2026-02-27T23:37:48Z

rust/src/reader/cloudfetch/streaming_provider.rs

+                                "Chunk {} got auth/not-found error from cloud storage, clearing cached link for refetch",
+                                chunk_index
+                            );
+                            entry.link = None;


Does this work? Only the entry.link is set to None, does it still go through current loop to delay and retry empty link and repeat until attempts are meet?

Yes, this works correctly. When entry.link = None is set, the loop sleeps (via tokio::time::sleep), then on the next iteration:

stored_link is read as None

The match stored_link hits the _ arm, which handles both None and expired links

refetch_link() is called to get a fresh URL from the Databricks API

The fresh URL is stored back in entry.link and used for the next download attempt

So we don't retry with an empty link — we refetch a valid one first. The sleep before the refetch is intentional (avoids hammering the API), and each auth error does count against attempts (also intentional — if Databricks keeps returning invalid URLs, we should eventually give up).

Good catch — updated in 9f6a9ac. When we get a 401/403/404 we now continue immediately instead of sleeping, so refetch_link() is called on the next iteration without the backoff delay. Calling refetch_link() directly inside the get_mut block would hold the DashMap write lock across an .await, risking the same deadlock described in the comment above — so the continue approach keeps things clean.

…Fetch retry When cloud storage returns an auth error, we know the fix is a fresh URL from Databricks — sleeping first just adds unnecessary latency. Use `continue` to jump straight to the next iteration's refetch_link() call. Also fix comment text: HTTP 403/404 -> HTTP 401/403/404. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add max_refresh_retries (default 3) as a distinct config field, bounded independently from max_retries (mirrors C# maxUrlRefreshAttempts) - Reset attempts=0 after a successful refetch so each fresh URL gets its own full download retry budget - Make retry sleep cancellation-aware via tokio::select! so cancellation is not delayed by the full backoff window - Document refetch_link(chunk_index, 0): row_offset=0 is intentional for the SEA (chunk-index based) backend; a Thrift backend would use it Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…haviour Instead of the two-step dance (clear entry.link, continue, refetch on next iteration), drop the DashMap write lock before the async call and refetch directly in the same iteration — no sleep, no extra loop turn. The lock guard is scoped so it drops before .await; a second brief get_mut writes the fresh link back. This avoids the deadlock risk of holding a shard write lock across an await point. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

eric-wang-1990 and others added 8 commits February 22, 2026 11:15

ci: rename Integration Tests to C# Integration Tests

346c373

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' of https://github.com/adbc-drivers/databricks int…

fb413fe

…o feat/trigger-rust-integration-tests

eric-wang-1990 added the integration-test Trigger integration tests in internal repo label Feb 25, 2026

eric-wang-1990 requested review from jadewang-db and vikrantpuppala February 25, 2026 23:57

jadewang-db approved these changes Feb 26, 2026

View reviewed changes

github-actions bot removed the integration-test Trigger integration tests in internal repo label Feb 26, 2026

eric-wang-1990 requested a review from Copilot February 26, 2026 02:56

Copilot started reviewing on behalf of eric-wang-1990 February 26, 2026 02:56 View session

eric-wang-1990 force-pushed the feat/peco-2918-cloudfetch-403-404-refetch branch from c325c25 to 9205781 Compare February 26, 2026 02:57

Copilot AI reviewed Feb 26, 2026

View reviewed changes

rust/src/reader/cloudfetch/streaming_provider.rs Show resolved Hide resolved

rust/src/reader/cloudfetch/streaming_provider.rs Show resolved Hide resolved

rust/src/reader/cloudfetch/streaming_provider.rs Show resolved Hide resolved

eric-wang-1990 requested a review from Copilot February 26, 2026 03:23

Copilot started reviewing on behalf of eric-wang-1990 February 26, 2026 03:24 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Merge branch 'main' into feat/peco-2918-cloudfetch-403-404-refetch

dcbe414

vikrantpuppala reviewed Feb 26, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 26, 2026

refactor(rust): use HTTP status code directly instead of string matching for presigned URL error detection #254

Merged

eric-wang-1990 commented Feb 27, 2026

View reviewed changes

eric-wang-1990 and others added 3 commits February 27, 2026 19:52

vikrantpuppala approved these changes Feb 28, 2026

View reviewed changes

	// HTTP 403/404 from cloud storage means the presigned URL was
	// HTTP 401/403/404 from cloud storage means the presigned URL was

Conversation

eric-wang-1990 commented Feb 25, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Feb 25, 2026

Uh oh!

github-actions bot commented Feb 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

PR Overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

vikrantpuppala Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

vikrantpuppala Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 26, 2026

Uh oh!

eric-wang-1990 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

eric-wang-1990 Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants