Skip to content

docs(rust): OAuth U2M and M2M authentication design#318

Closed
vikrantpuppala wants to merge 5 commits intoadbc-drivers:mainfrom
vikrantpuppala:oauth-u2m-m2m-design
Closed

docs(rust): OAuth U2M and M2M authentication design#318
vikrantpuppala wants to merge 5 commits intoadbc-drivers:mainfrom
vikrantpuppala:oauth-u2m-m2m-design

Conversation

@vikrantpuppala
Copy link
Collaborator

Summary

Design document for adding OAuth 2.0 authentication to the Rust ADBC driver, covering:

  • U2M (User-to-Machine): Authorization Code flow with PKCE for interactive browser-based login
  • M2M (Machine-to-Machine): Client Credentials flow for service principal authentication
  • Shared infrastructure: Token lifecycle management (FRESH/STALE/EXPIRED state machine), file-based token caching, OIDC endpoint discovery, and PKCE generation

The design uses the Python Databricks SDK as the reference implementation, adapting its patterns to Rust's ownership and concurrency model. Key integration point is the existing AuthProvider trait, which remains unchanged -- new OAuth providers implement this trait with internal async bridging for token fetches.

Key decisions and alternatives considered

  • Separate reqwest::Client for token endpoint calls instead of reusing DatabricksHttpClient -- avoids circular auth dependency since DatabricksHttpClient calls get_auth_header() on every request
  • Sync AuthProvider trait with internal async bridge (block_in_place) instead of making the trait async -- avoids changes across the entire call chain
  • Separate token cache from Python SDK (~/.config/databricks-adbc/oauth/) instead of sharing ~/.config/databricks-sdk-py/oauth/ -- cross-SDK cache sharing is fragile
  • Two separate provider types (ClientCredentialsProvider, AuthorizationCodeProvider) instead of a single OAuthProvider -- flows are fundamentally different (browser vs direct exchange, refresh_token vs re-auth)
  • Rejected oauth2 crate -- doesn't cover OIDC discovery, caching, or Databricks-specific token endpoint behavior; all reference SDKs implement OAuth from scratch

Areas needing specific review focus

  • Token refresh state machine: The FRESH -> STALE -> EXPIRED lifecycle with background refresh for stale tokens and blocking refresh for expired ones. Is the stale threshold formula (min(TTL * 0.5, 20 min)) appropriate?
  • Sync/async bridge: Using tokio::task::block_in_place + Handle::block_on inside the sync get_auth_header(). Are there concerns about blocking tokio worker threads?
  • Auth type auto-detection: When auth_type is not set, inferring from which credentials are provided. Should we require explicit auth_type instead?
  • Callback server port: Hardcoded default of 8020 matching the Python SDK. Should we support port range fallback (Go SDK tries 8020-8040)?
  • Cache directory: ~/.config/databricks-adbc/oauth/ -- is this the right location? Should we align with any other driver?

Generated with Claude Code

vikrantpuppala and others added 5 commits March 7, 2026 08:48
Design for OAuth 2.0 authentication in the Rust ADBC driver covering
Authorization Code + PKCE (U2M) and Client Credentials (M2M) flows,
including token refresh state machine, file-based caching, and OIDC
discovery.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use oauth2 crate for PKCE, token exchange, client credentials,
  and refresh token flows. Eliminates hand-rolled pkce.rs module.
- Reuse DatabricksHttpClient for token endpoint calls via
  execute_without_auth(), giving unified retry/timeout/pooling.
- Two-phase initialization: HTTP client created first, auth provider
  set later via OnceLock (matching SeaClient's reader_factory pattern).
- OAuth providers route token requests through the shared HTTP client
  with a custom oauth2 HTTP function adapter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add AuthMechanism enum: 0=Pat, 11=OAuth (matches ODBC AuthMech)
- Add AuthFlow enum: 0=TokenPassthrough, 1=ClientCredentials, 2=Browser
  (matches ODBC Auth_Flow)
- Both mechanism and flow are mandatory, no auto-detection
- Accept numeric values only, parsed via TryFrom
- Use unified DatabricksHttpClient with two-phase init
- Adopt oauth2 crate for protocol-level operations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes:
- databricks.oauth.token_endpoint -> databricks.auth.token_endpoint
- Config type Int/String -> Int (numeric only)
- Clarify oauth2 HTTP adapter needs thin conversion layer
- Architecture diagram shows M2M/U2M using execute_without_auth()
- Token passthrough (flow=0) documents no auto-refresh
- Stale threshold uses initial_TTL computed once at acquisition
- Deduplicate http.rs changes (reference Concurrency section)

Test strategy additions:
- Wiremock integration tests for full M2M flow with mocked HTTP
- Database config validation tests for enum parsing and new_connection
- HTTP client two-phase init tests for OnceLock lifecycle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3-task breakdown covering foundation + HTTP client changes,
M2M provider, and U2M provider with full test coverage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vikrantpuppala
Copy link
Collaborator Author

Closing in favor of a new PR from stacked branch.

vikrantpuppala added a commit that referenced this pull request Mar 13, 2026
## 🥞 Stacked PR
Use this
[link](https://github.com/adbc-drivers/databricks/pull/319/files) to
review incremental changes.
-
[**stack/oauth-u2m-m2m-design**](#319)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/319/files)]
-
[stack/pr-oauth-foundation](#320)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/320/files/250ff3d91c3001f671f08084f68e949e556bc5d2..bd474c189621aa70c1f14e97c32d64605275e07d)]
-
[stack/pr-database-config](#321)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/321/files/bd474c189621aa70c1f14e97c32d64605275e07d..296931cd396d82dccb1b548a51f6b9d31be3683e)]
-
[stack/pr-u2m-provider](#322)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/322/files/296931cd396d82dccb1b548a51f6b9d31be3683e..c96689981e79c04f43e8251f2cbd5690371dfca5)]
-
[stack/pr-integration-tests](#323)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/323/files/c96689981e79c04f43e8251f2cbd5690371dfca5..83d639337ca30688abb7bdba85aa16426d76eb31)]
-
[stack/pr-final-validation](#324)
[[Files
changed](https://github.com/adbc-drivers/databricks/pull/324/files/83d639337ca30688abb7bdba85aa16426d76eb31..e2cd82bf1e9510169735774784591074f30351d3)]

---------
## Summary

- Design document for adding OAuth 2.0 authentication to the Rust ADBC
driver covering both U2M (Authorization Code + PKCE) and M2M (Client
Credentials) flows
- Sprint plan breaking the implementation into 3 tasks: foundation +
HTTP client changes, M2M provider, U2M provider
- Uses the `oauth2` crate for protocol-level operations, unified
`DatabricksHttpClient` with two-phase `OnceLock` init, and ODBC-aligned
numeric config values (`AuthMech`/`Auth_Flow`)

## Key decisions and alternatives considered

- **`oauth2` crate adoption** over hand-rolling OAuth protocol
(eliminates ~200 lines of boilerplate, handles PKCE/token
exchange/refresh)
- **Unified HTTP client** (`DatabricksHttpClient` with `OnceLock`) over
separate `reqwest::Client` for token calls (shared retry logic,
connection pooling)
- **ODBC-aligned numeric config** (`mechanism=0/11`, `flow=0/1/2`) over
string-based or auto-detection (explicit, predictable, matches ODBC
driver)
- **Separate U2M/M2M providers** over single OAuthProvider (different
flows, refresh strategies, caching needs)
- **Separate token cache** (`~/.config/databricks-adbc/oauth/`) over
sharing Python SDK cache (fragile cross-SDK compatibility)

## Areas needing specific review focus

- Two-phase HTTP client initialization pattern (OnceLock for auth
provider) — is this the right approach for breaking the circular
dependency?
- Token refresh state machine (FRESH/STALE/EXPIRED) — are the thresholds
(40s expiry buffer, min(TTL*0.5, 20min) stale) appropriate?
- Config option naming (`databricks.auth.mechanism`,
`databricks.auth.flow`) — alignment with ODBC driver
- Sprint plan task breakdown — is the scope realistic for 2 weeks?

---

*Replaces #318 (closed — converted to stacked branch)*

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant