Closed
Conversation
- Add proxy-tests.yml workflow for CloudFetch failure injection tests - Runs on ubuntu-latest and macos-latest - Installs mitmproxy and Python dependencies - Trusts mitmproxy certificate for HTTPS interception - Runs ProxyInfrastructureTests and CloudFetchTests - Requires DATABRICKS_TEST_* secrets to be configured 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update workflow to use existing repository secrets: - DATABRICKS_HOST (instead of DATABRICKS_TEST_HOST) - DATABRICKS_TOKEN (unchanged) - TEST_PECO_WAREHOUSE_HTTP_PATH (instead of DATABRICKS_TEST_HTTP_PATH) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add workflow_dispatch trigger to allow manual testing from any branch. This enables testing the workflow before merging to main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use 'timeout 5 mitmdump || true' instead of 'mitmdump --version' to actually generate certificates before trusting them. The --version flag only prints version info and doesn't create certs. Running mitmdump briefly triggers certificate generation on first run. Increased timeout from 3s to 5s for more reliable certificate generation.
Remove unnecessary mitmdump run - mitmproxy generates certificates automatically during installation. Just trust them after pip install.
Certificates are not auto-generated during pip install in CI. Need to actually run mitmdump briefly to trigger certificate generation.
Start mitmdump in background, wait for cert generation, then kill it. This works on both macOS and Linux without needing GNU timeout command. Tests will start their own proxy instances using ProxyServerManager.
- Use 'uri' instead of separate hostname/http_path fields - Add 'auth_type': 'token' for token authentication - Matches format used in e2e-tests.yml
- Use DATABRICKS_SERVER_HOSTNAME instead of DATABRICKS_HOST
- Use ${{ env.VAR }} syntax in heredoc for GitHub Actions substitution
- Add logging to verify URI is constructed correctly
- Matches pattern from e2e-tests.yml
The ~ character in --set confdir=~/.mitmproxy was not being expanded because ProcessStartInfo doesn't use a shell. This caused mitmdump to fail with exit code 1 when trying to start the proxy. Now properly expanding ~ to the actual home directory path using Environment.GetFolderPath(Environment.SpecialFolder.UserProfile). Also added quotes around addon script path and confdir to handle paths with spaces.
Added Console.WriteLine for mitmdump output/errors instead of Debug.WriteLine so the logs appear in test output. This will help diagnose why mitmdump is exiting with code 1. Also log the exact command being executed and working directory to help debug path and configuration issues.
Added System.Linq import to fix compilation error with ConcurrentBag.Any() Also removed unused exception variable to fix CS0168 warning.
Use kill -9 and pkill to forcefully terminate any mitmdump processes from the certificate generation step to prevent port conflicts when tests start their own mitmdump instances. Addresses port 18080/18081 already in use errors.
Tests were failing because ProxyTestBase only checked for HostName/Port/Path properties, but the config uses 'uri' format. Now check for Uri property first and fall back to individual components if not present. This matches the e2e-tests.yml configuration format.
Flask takes significantly longer to start on macOS CI runners. Increased timeout from 5s to 60s for macOS, 30s for other platforms. Added progress logging every 10s to track startup status. Ubuntu tests succeed with 30s timeout, macOS needs more time.
Flask's development server with reloader enabled is slow to bind to the port. Disabled use_reloader and debug mode for much faster startup (~1-2s instead of 10-60s). Reduced timeout from 60s to 15s for all platforms since Flask now starts quickly.
The mitmproxy addon was using time.sleep() which blocked the entire event loop, preventing all other requests (including GetOperationStatus to the warehouse) from being processed during CloudFetch delays. Changes: - Made request() and _handle_cloudfetch_request() async - Changed time.sleep() to await asyncio.sleep() - Increased proxy startup timeout to 30s for CI stability This allows CloudFetch downloads to be delayed for testing while Thrift operations continue to work normally. Fixes CloudFetchTimeout_RetriesWithExponentialBackoff test. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
macOS CI runners have a ~60 second localhost networking delay on each test, making the test suite 5x slower (13m vs 2.5m) without providing additional test coverage. The proxy behavior is OS-agnostic at the protocol level. Changes: - Remove matrix strategy (ubuntu-latest, macos-latest) - Simplify to single ubuntu-latest runner - Remove macOS-specific certificate trust step - Align with e2e-tests.yml which also runs Ubuntu only 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Match the trigger configuration from e2e-tests.yml: - Run on pushes to main branch only (not maint-*) - Run on all pull requests (not just PRs to main) - Keep workflow_dispatch for manual triggering Benefits: - Tests run on all PRs for better coverage - Simpler branch management (no maint-* for push) - Consistent with e2e-tests workflow 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements the TODO items for Thrift call verification using the proxy's /thrift/calls endpoint. ProxyControlClient additions: - GetThriftCallsAsync() - Retrieves Thrift call history from proxy - CountThriftMethodCallsAsync() - Counts calls for a specific method - AssertThriftMethodCalledAsync() - Verifies minimum call count - ThriftCallHistory and ThriftCall DTOs for deserialization CloudFetchTests improvements: - CloudFetchExpiredLink: Verifies FetchResults called 2+ times (refresh) - CloudFetch403: Verifies FetchResults called 2+ times (refresh) - CloudFetchTimeout: Verifies FetchResults called once (no refresh) - CloudFetchConnectionReset: Verifies FetchResults called once (no refresh) - NormalCloudFetch: Verifies normal call pattern The tests now validate not just that data is retrieved, but that the driver uses the correct retry strategy for each failure type: - Link expiry/403: Refresh URL via FetchResults - Timeout/connection errors: Retry download without URL refresh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Track cloud downloads separately from Thrift calls in mitmproxy addon - Add CountCloudDownloadsAsync() method to ProxyControlClient - Update CloudFetchExpiredLink/403 tests to verify baseline+1 FetchResults calls - Update CloudFetchTimeout/ConnectionReset tests to verify baseline+1 cloud downloads - All tests now establish dynamic baseline by running query without failure scenarios - Simplified NormalCloudFetch test to use at-least-once verification - Extract test query to constant to avoid duplication This ensures tests accurately verify driver retry/refresh behavior: - Expired link/403: Driver calls FetchResults again for fresh URL - Timeout/connection reset: Driver retries cloud download directly 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…eout config -Replace CreateProxiedConnectionWithCloudFetchTimeout with generic CreateProxiedConnectionWithParameters - Update CloudFetchTimeout test to set adbc.databricks.cloudfetch.timeout_minutes to 1 minute - Add System.Collections.Generic using for Dictionary type - Allows flexible parameter configuration for different test scenarios 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add debug logging to CloudFetchDownloader to trace retry attempts - Add CloudFetchTimeout test with configurable timeout parameter - Add CreateProxiedConnectionWithParameters helper for custom config - Add detailed diagnostics to understand CloudFetch behavior Investigation findings: - CloudFetchTimeout test fails because Thrift driver doesn't apply timeout - HttpClient uses default 100s timeout instead of configured 1min timeout - CloudFetchConfiguration.TimeoutMinutes is read but never applied - Need to fix: Apply timeout when creating HttpClient in DatabricksConnection.cs Note: CloudFetchTimeout_RetriesWithExponentialBackoff test currently fails with Expected: 3, Actual: 2 cloud downloads due to missing timeout config. The test expects file 1 to timeout and retry, but timeout never occurs.
Remove Console.WriteLine statements that were added for investigation. The retry behavior can be traced through existing activity tracing.
Collaborator
Author
|
Replaced by stack PR #111 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add test infrastructure to verify CloudFetch retry behavior and Thrift protocol call patterns. This PR introduces helper methods and diagnostic capabilities to track CloudFetch operations through the proxy server.
Changes
Test Infrastructure
CreateProxiedConnectionWithParametershelper method for flexible test parameter configurationTest Enhancements
Update CloudFetchTests.cs to include detailed diagnostics:
Test Coverage
The updated test suite verifies:
Benefits
Test Plan
Run all CloudFetch tests:
All tests exercise different failure scenarios and verify proper handling.
🤖 Generated with Claude Code