Skip to content

Commit 6d8d906

Browse files
Support Databricks Workload Identity Federation for GitHub tokens (#933)
## What changes are proposed in this pull request? This PR adds support for Databricks Workload Identity Federation using GitHub tokens. This allows users to use WIF from their GitHub Workflows and authenticate their workloads without long lived secrets. This new credentials strategy is added to the DefaultCredentialsStrategy after the other Databricks Credentials Strategy and before cloud specific authentication methods. WIF credentials uses a subset of configuration values of other Databricks authentication methods. By being added after them it ensures that WIF is not used when other Databricks authentication methods are configured. WIF uses the Databricks client id, which is not used by cloud specific authentication methods. Therefore, it will not be used when cloud specific authentication methods are configured. ## How is this tested? Added tests.
1 parent 62e6a7e commit 6d8d906

File tree

7 files changed

+175
-21
lines changed

7 files changed

+175
-21
lines changed

NEXT_CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
* Enabled asynchronous token refreshes by default. A new `disable_async_token_refresh` configuration option has been added to allow disabling this feature if necessary ([#952](https://github.com/databricks/databricks-sdk-py/pull/952)).
77
To disable asynchronous token refresh, set the environment variable `DATABRICKS_DISABLE_ASYNC_TOKEN_REFRESH=true` or configure it within your configuration object.
88
The previous `enable_experimental_async_token_refresh` option has been removed as asynchronous refresh is now the default behavior.
9+
* Introduce support for Databricks Workload Identity Federation in GitHub workflows ([933](https://github.com/databricks/databricks-sdk-py/pull/933)).
10+
See README.md for instructions.
11+
* [Breaking] Users running their workflows in GitHub Actions, which use Cloud native authentication and also have a `DATABRICKS_CLIENT_ID` and `DATABRICKS_HOST`
12+
environment variables set may see their authentication start failing due to the order in which the SDK tries different authentication methods.
913

1014
### Bug Fixes
1115

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -126,18 +126,18 @@ Depending on the Databricks authentication method, the SDK uses the following in
126126

127127
### Databricks native authentication
128128

129-
By default, the Databricks SDK for Python initially tries [Databricks token authentication](https://docs.databricks.com/dev-tools/api/latest/authentication.html) (`auth_type='pat'` argument). If the SDK is unsuccessful, it then tries Databricks basic (username/password) authentication (`auth_type="basic"` argument).
129+
By default, the Databricks SDK for Python initially tries [Databricks token authentication](https://docs.databricks.com/dev-tools/api/latest/authentication.html) (`auth_type='pat'` argument). If the SDK is unsuccessful, it then tries Databricks Workload Identity Federation (WIF) authentication using OIDC (`auth_type="github-oidc"` argument).
130130

131131
- For Databricks token authentication, you must provide `host` and `token`; or their environment variable or `.databrickscfg` file field equivalents.
132-
- For Databricks basic authentication, you must provide `host`, `username`, and `password` _(for AWS workspace-level operations)_; or `host`, `account_id`, `username`, and `password` _(for AWS, Azure, or GCP account-level operations)_; or their environment variable or `.databrickscfg` file field equivalents.
133-
134-
| Argument | Description | Environment variable |
135-
|--------------|-------------|-------------------|
136-
| `host` | _(String)_ The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | `DATABRICKS_HOST` |
137-
| `account_id` | _(String)_ The Databricks account ID for the Databricks accounts endpoint. Only has effect when `Host` is either `https://accounts.cloud.databricks.com/` _(AWS)_, `https://accounts.azuredatabricks.net/` _(Azure)_, or `https://accounts.gcp.databricks.com/` _(GCP)_. | `DATABRICKS_ACCOUNT_ID` |
138-
| `token` | _(String)_ The Databricks personal access token (PAT) _(AWS, Azure, and GCP)_ or Azure Active Directory (Azure AD) token _(Azure)_. | `DATABRICKS_TOKEN` |
139-
| `username` | _(String)_ The Databricks username part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_USERNAME` |
140-
| `password` | _(String)_ The Databricks password part of basic authentication. Only possible when `Host` is `*.cloud.databricks.com` _(AWS)_. | `DATABRICKS_PASSWORD` |
132+
- For Databricks OIDC authentication, you must provide the `host`, `client_id` and `token_audience` _(optional)_ either directly, through the corresponding environment variables, or in your `.databrickscfg` configuration file.
133+
134+
| Argument | Description | Environment variable |
135+
|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
136+
| `host` | _(String)_ The Databricks host URL for either the Databricks workspace endpoint or the Databricks accounts endpoint. | `DATABRICKS_HOST` |
137+
| `account_id` | _(String)_ The Databricks account ID for the Databricks accounts endpoint. Only has effect when `Host` is either `https://accounts.cloud.databricks.com/` _(AWS)_, `https://accounts.azuredatabricks.net/` _(Azure)_, or `https://accounts.gcp.databricks.com/` _(GCP)_. | `DATABRICKS_ACCOUNT_ID` |
138+
| `token` | _(String)_ The Databricks personal access token (PAT) _(AWS, Azure, and GCP)_ or Azure Active Directory (Azure AD) token _(Azure)_. | `DATABRICKS_TOKEN` |
139+
| `client_id` | _(String)_ The Databricks Service Principal Application ID. | `DATABRICKS_CLIENT_ID` |
140+
| `token_audience` | _(String)_ When using Workload Identity Federation, the audience to specify when fetching an ID token from the ID token supplier. | `TOKEN_AUDIENCE` |
141141

142142
For example, to use Databricks token authentication:
143143

databricks/sdk/__init__.py

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

databricks/sdk/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,7 @@ class Config:
6161
host: str = ConfigAttribute(env="DATABRICKS_HOST")
6262
account_id: str = ConfigAttribute(env="DATABRICKS_ACCOUNT_ID")
6363
token: str = ConfigAttribute(env="DATABRICKS_TOKEN", auth="pat", sensitive=True)
64+
token_audience: str = ConfigAttribute(env="DATABRICKS_TOKEN_AUDIENCE", auth="github-oidc")
6465
username: str = ConfigAttribute(env="DATABRICKS_USERNAME", auth="basic")
6566
password: str = ConfigAttribute(env="DATABRICKS_PASSWORD", auth="basic", sensitive=True)
6667
client_id: str = ConfigAttribute(env="DATABRICKS_CLIENT_ID", auth="oauth")

databricks/sdk/credentials_provider.py

Lines changed: 57 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from .azure import add_sp_management_token, add_workspace_id_header
2424
from .oauth import (ClientCredentials, OAuthClient, Refreshable, Token,
2525
TokenCache, TokenSource)
26+
from .oidc_token_supplier import GitHubOIDCTokenSupplier
2627

2728
CredentialsProvider = Callable[[], Dict[str, str]]
2829

@@ -314,6 +315,58 @@ def token() -> Token:
314315
return OAuthCredentialsProvider(refreshed_headers, token)
315316

316317

318+
@oauth_credentials_strategy("github-oidc", ["host", "client_id"])
319+
def databricks_wif(cfg: "Config") -> Optional[CredentialsProvider]:
320+
"""
321+
DatabricksWIFCredentials uses a Token Supplier to get a JWT Token and exchanges
322+
it for a Databricks Token.
323+
324+
Supported suppliers:
325+
- GitHub OIDC
326+
"""
327+
supplier = GitHubOIDCTokenSupplier()
328+
329+
audience = cfg.token_audience
330+
if audience is None and cfg.is_account_client:
331+
audience = cfg.account_id
332+
if audience is None and not cfg.is_account_client:
333+
audience = cfg.oidc_endpoints.token_endpoint
334+
335+
# Try to get an idToken. If no supplier returns a token, we cannot use this authentication mode.
336+
id_token = supplier.get_oidc_token(audience)
337+
if not id_token:
338+
return None
339+
340+
def token_source_for(audience: str) -> TokenSource:
341+
id_token = supplier.get_oidc_token(audience)
342+
if not id_token:
343+
# Should not happen, since we checked it above.
344+
raise Exception("Cannot get OIDC token")
345+
params = {
346+
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
347+
"subject_token": id_token,
348+
"grant_type": "urn:ietf:params:oauth:grant-type:token-exchange",
349+
}
350+
return ClientCredentials(
351+
client_id=cfg.client_id,
352+
client_secret="", # we have no (rotatable) secrets in OIDC flow
353+
token_url=cfg.oidc_endpoints.token_endpoint,
354+
endpoint_params=params,
355+
scopes=["all-apis"],
356+
use_params=True,
357+
disable_async=cfg.disable_async_token_refresh,
358+
)
359+
360+
def refreshed_headers() -> Dict[str, str]:
361+
token = token_source_for(audience).token()
362+
return {"Authorization": f"{token.token_type} {token.access_token}"}
363+
364+
def token() -> Token:
365+
return token_source_for(audience).token()
366+
367+
return OAuthCredentialsProvider(refreshed_headers, token)
368+
369+
317370
@oauth_credentials_strategy("github-oidc-azure", ["host", "azure_client_id"])
318371
def github_oidc_azure(cfg: "Config") -> Optional[CredentialsProvider]:
319372
if "ACTIONS_ID_TOKEN_REQUEST_TOKEN" not in os.environ:
@@ -325,16 +378,8 @@ def github_oidc_azure(cfg: "Config") -> Optional[CredentialsProvider]:
325378
if not cfg.is_azure:
326379
return None
327380

328-
# See https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-cloud-providers
329-
headers = {"Authorization": f"Bearer {os.environ['ACTIONS_ID_TOKEN_REQUEST_TOKEN']}"}
330-
endpoint = f"{os.environ['ACTIONS_ID_TOKEN_REQUEST_URL']}&audience=api://AzureADTokenExchange"
331-
response = requests.get(endpoint, headers=headers)
332-
if not response.ok:
333-
return None
334-
335-
# get the ID Token with aud=api://AzureADTokenExchange sub=repo:org/repo:environment:name
336-
response_json = response.json()
337-
if "value" not in response_json:
381+
token = GitHubOIDCTokenSupplier().get_oidc_token("api://AzureADTokenExchange")
382+
if not token:
338383
return None
339384

340385
logger.info(
@@ -344,7 +389,7 @@ def github_oidc_azure(cfg: "Config") -> Optional[CredentialsProvider]:
344389
params = {
345390
"client_assertion_type": "urn:ietf:params:oauth:client-assertion-type:jwt-bearer",
346391
"resource": cfg.effective_azure_login_app_id,
347-
"client_assertion": response_json["value"],
392+
"client_assertion": token,
348393
}
349394
aad_endpoint = cfg.arm_environment.active_directory_endpoint
350395
if not cfg.azure_tenant_id:
@@ -927,6 +972,7 @@ def __init__(self) -> None:
927972
basic_auth,
928973
metadata_service,
929974
oauth_service_principal,
975+
databricks_wif,
930976
azure_service_principal,
931977
github_oidc_azure,
932978
azure_cli,
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
import os
2+
from typing import Optional
3+
4+
import requests
5+
6+
7+
class GitHubOIDCTokenSupplier:
8+
"""
9+
Supplies OIDC tokens from GitHub Actions.
10+
"""
11+
12+
def get_oidc_token(self, audience: str) -> Optional[str]:
13+
if "ACTIONS_ID_TOKEN_REQUEST_TOKEN" not in os.environ or "ACTIONS_ID_TOKEN_REQUEST_URL" not in os.environ:
14+
# not in GitHub actions
15+
return None
16+
# See https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-cloud-providers
17+
headers = {"Authorization": f"Bearer {os.environ['ACTIONS_ID_TOKEN_REQUEST_TOKEN']}"}
18+
endpoint = f"{os.environ['ACTIONS_ID_TOKEN_REQUEST_URL']}&audience={audience}"
19+
response = requests.get(endpoint, headers=headers)
20+
if not response.ok:
21+
return None
22+
23+
# get the ID Token with aud=api://AzureADTokenExchange sub=repo:org/repo:environment:name
24+
response_json = response.json()
25+
if "value" not in response_json:
26+
return None
27+
28+
return response_json["value"]

tests/integration/test_auth.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
import pytest
1414

15+
from databricks.sdk import AccountClient, WorkspaceClient
16+
from databricks.sdk.service import iam, oauth2
1517
from databricks.sdk.service.compute import (ClusterSpec, DataSecurityMode,
1618
Library, ResultType, SparkVersion)
1719
from databricks.sdk.service.jobs import NotebookTask, Task, ViewType
@@ -198,3 +200,72 @@ def _task_outputs(w, run):
198200
output += data["data"]
199201
task_outputs[task_run.task_key] = output
200202
return task_outputs
203+
204+
205+
def test_wif_account(ucacct, env_or_skip, random):
206+
207+
sp = ucacct.service_principals.create(
208+
active=True,
209+
display_name="py-sdk-test-" + random(),
210+
roles=[iam.ComplexValue(value="account_admin")],
211+
)
212+
213+
ucacct.service_principal_federation_policy.create(
214+
policy=oauth2.FederationPolicy(
215+
oidc_policy=oauth2.OidcFederationPolicy(
216+
issuer="https://token.actions.githubusercontent.com",
217+
audiences=["https://github.com/databricks-eng"],
218+
subject="repo:databricks-eng/eng-dev-ecosystem:environment:integration-tests",
219+
)
220+
),
221+
service_principal_id=sp.id,
222+
)
223+
224+
ac = AccountClient(
225+
host=ucacct.config.host,
226+
account_id=ucacct.config.account_id,
227+
client_id=sp.application_id,
228+
auth_type="github-oidc",
229+
token_audience="https://github.com/databricks-eng",
230+
)
231+
232+
groups = ac.groups.list()
233+
234+
next(groups)
235+
236+
237+
def test_wif_workspace(ucacct, env_or_skip, random):
238+
239+
workspace_id = env_or_skip("TEST_WORKSPACE_ID")
240+
workspace_url = env_or_skip("TEST_WORKSPACE_URL")
241+
242+
sp = ucacct.service_principals.create(
243+
active=True,
244+
display_name="py-sdk-test-" + random(),
245+
)
246+
247+
ucacct.service_principal_federation_policy.create(
248+
policy=oauth2.FederationPolicy(
249+
oidc_policy=oauth2.OidcFederationPolicy(
250+
issuer="https://token.actions.githubusercontent.com",
251+
audiences=["https://github.com/databricks-eng"],
252+
subject="repo:databricks-eng/eng-dev-ecosystem:environment:integration-tests",
253+
)
254+
),
255+
service_principal_id=sp.id,
256+
)
257+
258+
ucacct.workspace_assignment.update(
259+
workspace_id=workspace_id,
260+
principal_id=sp.id,
261+
permissions=[iam.WorkspacePermission.ADMIN],
262+
)
263+
264+
ws = WorkspaceClient(
265+
host=workspace_url,
266+
client_id=sp.application_id,
267+
auth_type="github-oidc",
268+
token_audience="https://github.com/databricks-eng",
269+
)
270+
271+
ws.current_user.me()

0 commit comments

Comments
 (0)