docs(rust): OAuth U2M and M2M authentication design#319
docs(rust): OAuth U2M and M2M authentication design#319vikrantpuppala merged 6 commits intoadbc-drivers:mainfrom
Conversation
| /// Config values match the ODBC driver's AuthMech numeric codes. | ||
| #[derive(Debug, Clone, PartialEq)] | ||
| #[repr(u8)] | ||
| pub enum AuthMechanism { |
There was a problem hiding this comment.
this look specific to JDBC, is this also followed in other drivers like ADBC and Python?
There was a problem hiding this comment.
Good point — this is specific to the ODBC driver's config scheme. For ADBC, we adopted the same numeric codes (AuthMech=0/11, Auth_Flow=0/1/2) to stay aligned with ODBC since both drivers are maintained together and we want a consistent config surface for users switching between them. The Python SDK uses string-based config (auth_type="databricks-oauth-m2m") which is a different pattern — we intentionally chose the ODBC-aligned numeric approach for the Rust driver since it also backs the ODBC bridge layer.
There was a problem hiding this comment.
Actually, rethinking this — you're right that numeric codes are an ODBC-ism that doesn't belong in the ADBC API. Updated to a single string-based databricks.auth.type option:
databricks.auth.type = "access_token" # Personal access token
databricks.auth.type = "oauth_m2m" # Client credentials (service principal)
databricks.auth.type = "oauth_u2m" # Authorization code + PKCE (browser)
One key-value instead of two, self-documenting, no magic numbers. The ODBC bridge layer can map its own AuthMech/Auth_Flow DSN values to these strings internally.
Changes across the stack: design doc (this PR), AuthType enum + AuthConfig (PR #321), database.rs option parsing + new_connection() matching (PRs #321-#324), E2E tests (PR #323).
|
|
||
| | Component | Mechanism | Guarantee | | ||
| |-----------|-----------|-----------| | ||
| | `TokenStore.token` | `std::sync::RwLock` | Multiple readers, single writer | |
There was a problem hiding this comment.
parking_lot::RwLock would be better in high concurrent scenario
There was a problem hiding this comment.
Considered this — parking_lot::RwLock is better under high contention, but TokenStore is accessed once per HTTP request (read lock fast path), with writes only during token refresh. That's not a high-contention scenario, so std::sync::RwLock is sufficient here and avoids adding an extra dependency. If we see contention in benchmarks later we can revisit.
| ``` | ||
|
|
||
| **Contract:** | ||
| - `get_or_refresh(refresh_fn)`: Returns a valid token. If STALE, spawns background refresh via `std::thread::spawn` and returns current token. If EXPIRED, blocks caller until refresh completes. |
There was a problem hiding this comment.
can use tokio::spawn instead, which would be cheaper
There was a problem hiding this comment.
Good call — updated token_store.rs to use tokio::task::spawn_blocking instead of std::thread::spawn. This reuses tokio's blocking thread pool rather than spawning a new OS thread for each background refresh. The driver already has a tokio runtime available since both M2M and U2M providers use tokio::task::block_in_place + Handle::current().block_on(). Change is in PR #320.
| ``` | ||
|
|
||
| **Contract:** | ||
| - Binds to `localhost:{port}` (default 8020) |
There was a problem hiding this comment.
what if provided port is in use?
There was a problem hiding this comment.
also configurable via auth_redirect_port. we will do an auto-increment later
This implements SQLSTATE propagation from Databricks server errors to ODBC clients, replacing generic HY000 with specific error codes like 42601 (syntax error) and 42S02 (table not found). ## Changes ### databricks-adbc/rust/src/error.rs - Added `extract_sqlstate_from_message()` to parse "SQLSTATE: XXXXX" from server error messages - Added `map_error_code_to_sqlstate()` to map error codes like PARSE_SYNTAX_ERROR→42601, TABLE_OR_VIEW_NOT_FOUND→42S02 - Added `sqlstate_str_to_array()` helper to convert strings to c_char arrays - Added comprehensive unit tests for all new functions ### databricks-adbc/rust/src/client/sea.rs - Updated error handling in `wait_for_completion()` to set SQLSTATE on errors - Tries 3 sources in order: sql_state field, extracted from message, mapped from error_code - Preserves error message while adding SQLSTATE metadata ### databricks-adbc/rust/src/types/sea.rs - Added optional `sql_state` field to `ServiceError` struct for future server support ### databricks-odbc/src/adbc_error_utils.cpp - **No changes needed** - Already correctly reads `error->sqlstate` and propagates to DriverException ## Test Results - All Rust unit tests pass (122 passed) - Standalone test verifies SQLSTATE extraction works correctly - Error message "SQLSTATE: 42601" correctly extracted and propagated ## Exit Criteria ✓ Syntax errors return SQLSTATE 42601 ✓ Table not found returns SQLSTATE 42S02 ✓ Generic HY000 only used for truly unknown errors ✓ Code compiles successfully in both repos Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Design for OAuth 2.0 authentication in the Rust ADBC driver covering Authorization Code + PKCE (U2M) and Client Credentials (M2M) flows, including token refresh state machine, file-based caching, and OIDC discovery. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use oauth2 crate for PKCE, token exchange, client credentials, and refresh token flows. Eliminates hand-rolled pkce.rs module. - Reuse DatabricksHttpClient for token endpoint calls via execute_without_auth(), giving unified retry/timeout/pooling. - Two-phase initialization: HTTP client created first, auth provider set later via OnceLock (matching SeaClient's reader_factory pattern). - OAuth providers route token requests through the shared HTTP client with a custom oauth2 HTTP function adapter. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AuthMechanism enum: 0=Pat, 11=OAuth (matches ODBC AuthMech) - Add AuthFlow enum: 0=TokenPassthrough, 1=ClientCredentials, 2=Browser (matches ODBC Auth_Flow) - Both mechanism and flow are mandatory, no auto-detection - Accept numeric values only, parsed via TryFrom - Use unified DatabricksHttpClient with two-phase init - Adopt oauth2 crate for protocol-level operations Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes: - databricks.oauth.token_endpoint -> databricks.auth.token_endpoint - Config type Int/String -> Int (numeric only) - Clarify oauth2 HTTP adapter needs thin conversion layer - Architecture diagram shows M2M/U2M using execute_without_auth() - Token passthrough (flow=0) documents no auto-refresh - Stale threshold uses initial_TTL computed once at acquisition - Deduplicate http.rs changes (reference Concurrency section) Test strategy additions: - Wiremock integration tests for full M2M flow with mocked HTTP - Database config validation tests for enum parsing and new_connection - HTTP client two-phase init tests for OnceLock lifecycle Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cf7d895 to
4dd8f51
Compare
Range-diff: main (cf7d895 -> 4dd8f51)
Reproduce locally: |
4dd8f51 to
d2cecd6
Compare
Range-diff: main (4dd8f51 -> d2cecd6)
Reproduce locally: |
3-task breakdown covering foundation + HTTP client changes, M2M provider, and U2M provider with full test coverage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d2cecd6 to
250ff3d
Compare
🥞 Stacked PR
Use this link to review incremental changes.
Summary
Key design decisions
databricks.auth.typestring config (access_token,oauth_m2m,oauth_u2m) instead of ODBC-style numericAuthMech/Auth_FlowTokenStorestate machine: FRESH → STALE (background refresh viatokio::task::spawn_blocking) → EXPIRED (blocking refresh)TokenCachefor U2M disk persistence at~/.config/databricks-adbc/oauth/with SHA-256 hashed filenamesoauth2crate for protocol-level operationsDatabricksHttpClientinitialization viaOnceLockto break circular auth dependencyKey files
rust/docs/designs/oauth-u2m-m2m-design.mdrust/docs/designs/oauth-sprint-plan.mdThis pull request was AI-assisted by Isaac.