Skip to content

Conversation

@sreekanth-db
Copy link
Contributor

@sreekanth-db sreekanth-db commented Jul 11, 2025

What changes are proposed in this pull request?

This PR modifies Azure service principal credential provider to attempt to load the tenant ID of the workspace if not provided before authenticating. Tenant ID is indirectly exposed via the redirect URL used when logging into a workspace. In this PR, we fetch the tenant ID from this endpoint and configure it if not already set.

Reference PR: databricks/databricks-sdk-py#638

Key changes:

  • Added inferTenantId() method in AzureUtils that makes an HTTP request to {host}/aad/auth and extracts the tenant ID from the redirect URL
  • Modified AzureServicePrincipalCredentialsProvider to remove the explicit azureTenantId requirement and automatically call tenant ID discovery
  • Added comprehensive unit tests covering success scenarios, error handling, and edge cases

Technical implementation:

  • Makes HTTP GET request to https://<workspace-host>/aad/auth endpoint
  • Follows redirect chain to extract tenant ID from URLs like https://login.microsoftonline.com/{tenant-id}/oauth2/authorize
  • Handles various error scenarios gracefully (404, missing headers, malformed URLs)
  • Only works for Azure workspaces (skips for non-Azure hosts)
  • Respects existing tenant ID if already configured

Why are these changes needed?

Currently, Azure Databricks users must manually specify the tenant-id when using Service Principal authentication. With this feature, users don't need to manually specify tenant ID, thus improving the user experience.

How is this tested?

Unit Tests

Comprehensive unit tests in AzureUtilsTest covering:

  • ✅ Happy path (successful tenant ID extraction)
  • ✅ Error scenarios (404, missing Location header, malformed URLs)
  • ✅ Edge cases (non-Azure workspaces, already set tenant ID, missing host)
  • ✅ Integration with existing functionality

Manual Testing

  • ✅ Locally built SDK JAR imported into JDBC driver and tested connection without passing azure_tenant_id
  • ✅ Verified successful authentication using auto-discovered tenant ID
  • ✅ Confirmed backward compatibility - still works when explicit tenant ID is provided

Signed-off-by: Sreekanth Vadigi <[email protected]>
Signed-off-by: Sreekanth Vadigi <[email protected]>
Signed-off-by: Sreekanth Vadigi <[email protected]>
}
AzureUtils.ensureHostPresent(
config, mapper, AzureServicePrincipalCredentialsProvider::tokenSourceFor);
config.loadAzureTenantId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you return null if azure tenant id fails? to be consistent with line 26?
Also, can we keep the azureSPCredentials specific helpers for extracting tenant id here in credentialsProvider itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated the code to return null if loading tenant id fails.

I think we should keep loadAzureTenantId() in DatabricksConfig for reusability. Looking at the Python SDK's implementation, load_azure_tenant_id() is called from multiple Azure credential providers (azure_service_principal and azure_cli), which demonstrates it's designed as a shared utility in the main Config class. This approach ensures consistency across our SDKs and allows for future reuse by other Azure authentication methods.


public DatabricksConfig clone() {
return clone(new HashSet<>());
return clone(new HashSet<>(Collections.singletonList("logger")));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

@sreekanth-db sreekanth-db Aug 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logger field is excluded from cloning because it's a static final field. The clone(Set<String> fieldsToSkip) method uses reflection to copy field values with f.set(newConfig, f.get(this)).

Attempting to clone a static field causes an IllegalAccessException with the message: "Can not set static final org.slf4j.Logger field com.databricks.sdk.core.DatabricksConfig.logger to org.slf4j.reload4j.Reload4jLoggerAdapter"

Signed-off-by: Sreekanth Vadigi <[email protected]>
return newConfig;
}

public DatabricksConfig clone() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to clone this method for all our config classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this comment? Are you asking whether we need to implement similar clone() methods in other configuration classes, or are you suggesting a different architectural approach for handling object cloning?

return true; // Tenant ID already available - success
}

if (!isAzure() || host == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not the isAzure check be the first check in this method & return false even if azureTenantId is having some value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of this method is to load the tenant ID, so if it's already set (explicitly by the user), we should respect that and return true. The isAzure() check is only needed when we need to discover the tenant ID from the workspace host. If the tenant ID is set and the host is not Azure, it should get caught at appropriate layers - this method should not be responsible for returning false and blocking the flow.

*
* @return true if tenant ID is available (either was already set or successfully loaded), false otherwise
*/
public boolean loadAzureTenantId() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that there's a lot of exceptions in the current code but I would recommend treating the config as an immutable object. That is, do not change the tenantId in the config and rather have a mutable copy in the AzureServicePrincipalCredentialsProvider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I have updated the code to not make any changes to the state of DatabricksConfig

Comment on lines 31 to 34
boolean tenantIdLoaded = config.loadAzureTenantId();
if (!tenantIdLoaded) {
return null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in another comment, I would recommend implementing the loadAzureTenantId as an immutable function that would return the tenantId or fail. The tenantId can be stored in this class.

Something like this:

try {
  this.tenantId = inferTenantId(config)
} catch (Exception e) {
  logger.debug("Failed to extract tenant ID: {}", e.getMessage())
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code as per your suggestions:

  • Not changing the state of DatabricksConfig
  • Maintaining local variable for tenantId in AzureServicePrincipalCredentialsProvider
  • Moved the tenant id inferring logic to AzureUtils for reusability

Signed-off-by: Sreekanth Vadigi <[email protected]>
Comment on lines 128 to 129
logger.warn("Failed to get redirect location from Azure auth endpoint: {}", loginUrl);
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's throw an exception instead, the caller should be the one deciding whether these should be logged or not. Same in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code in AzureUtils to throw exception, caller will handle it now and logs the message

Signed-off-by: Sreekanth Vadigi <[email protected]>
Copy link
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo remaining comment to remove the logger from AzureUtils.

try {
String redirectLocation = getRedirectLocation(config, loginUrl);
String extractedTenantId = extractTenantIdFromUrl(redirectLocation);
logger.info("Successfully discovered Azure tenant ID: {}", extractedTenantId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove private static final Logger logger = LoggerFactory.getLogger(AzureUtils.class);

Suggested change
logger.info("Successfully discovered Azure tenant ID: {}", extractedTenantId);

@renaudhartert-db renaudhartert-db changed the title infer azure tenant id Infer azure tenant ID. Aug 13, 2025
Signed-off-by: Sreekanth Vadigi <[email protected]>
@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-java

Inputs:

  • PR number: 482
  • Commit SHA: ea7ebc302324e8e9306780a23e055ebf364d179f

Checks will be approved automatically on success.

@renaudhartert-db renaudhartert-db added this pull request to the merge queue Aug 19, 2025
Merged via the queue into databricks:main with commit fe28261 Aug 19, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants