Skip to content

OCPBUGS-59734: fix(azure): resolve credential caching issues around UAMI support #1238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

bryan-cox
Copy link
Member

@bryan-cox bryan-cox commented Aug 1, 2025

Summary

This PR fixes credential caching issues in Azure storage operations and adds caching support for User Assigned Managed Identity (UAMI) credentials.

Caching at the driver level was not enough so a global cache was introduced so that we are not getting new credentials over and over from Azure.

Changes

  • fix(azure): fix credential caching key mismatch in driver storageAccountsClient
    Resolves credential caching key inconsistencies in the storage accounts client

  • fix(azure): fix credential caching key mismatch in azureclient
    Fixes credential caching key mismatches in the Azure client implementation

  • feat(azure): add ensureUAMICredentials function with comprehensive tests

    • Adds new ensureUAMICredentials function to obtain and cache Azure TokenCredential using User Assigned Managed Identity (UAMI)
    • Function loads credentials from global cache or creates new ones with proper environment configuration
    • Comprehensive unit tests covering environment variable handling, cache behavior, and error scenarios
    • Tests follow table-driven pattern and use t.Setenv for proper environment handling
    • Updates Azure client instantiation to use UAMI credentials when available

Testing

  • Added comprehensive unit tests for the new ensureUAMICredentials function
  • Tests follow established codebase patterns and conventions

Impact

  • Improves credential management reliability in Azure storage operations
  • Enables proper UAMI support for managed Azure environments
  • No breaking changes to existing functionality

@sjenning
Copy link
Contributor

sjenning commented Aug 1, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2025
Copy link
Contributor

openshift-ci bot commented Aug 1, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: bryan-cox, sjenning
Once this PR has been reviewed and has the lgtm label, please assign ricardomaraschini for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Aug 1, 2025
Copy link
Contributor

openshift-ci bot commented Aug 1, 2025

New changes are detected. LGTM label has been removed.

@bryan-cox bryan-cox marked this pull request as draft August 1, 2025 17:36
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2025
@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

1 similar comment
@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox bryan-cox changed the title fix(azure): fix credential caching key mismatch in UserAssignedIdentityCredentials OCPBUGS-60103: fix(azure): fix credential caching key mismatch in UserAssignedIdentityCredentials Aug 4, 2025
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 4, 2025
@openshift-ci-robot
Copy link
Contributor

@bryan-cox: This pull request references Jira Issue OCPBUGS-60103, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Use consistent azureCredentialsKey for both storing and loading cached credentials instead of mixing azureCredentialsKey and userAssignedIdentityCredentialsFilePath. This prevents repeated credential recreation and eliminates redundant log messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…untsClient

Similar to the azureclient fix, use consistent azureCredentialsKey for both
storing and loading cached credentials instead of mixing azureCredentialsKey
and userAssignedIdentityCredentialsFilePath. This prevents repeated credential
recreation in the Azure driver's storageAccountsClient method.
Use consistent azureCredentialsKey for both storing and loading cached
credentials instead of mixing azureCredentialsKey and
userAssignedIdentityCredentialsFilePath. This prevents repeated credential
recreation in the Azure driver's storageAccountsClient method.

Signed-off-by: Bryan Cox <[email protected]>
@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@flavianmissi
Copy link
Member

Hey @bryan-cox! Can you elaborate on your decision of caching the entire storage driver, vs for example caching the credentials within the driver?

@bryan-cox
Copy link
Member Author

Hey @bryan-cox! Can you elaborate on your decision of caching the entire storage driver, vs for example caching the credentials within the driver?

@flavianmissi it's still in WIP so I wouldn't consider what is here to be the final solution.

@flavianmissi
Copy link
Member

Understood, I'll hold my horses 🤠

@bryan-cox bryan-cox changed the title OCPBUGS-60103: fix(azure): fix credential caching key mismatch in UserAssignedIdentityCredentials OCPBUGS-59734: fix(azure): fix credential caching key mismatch in UserAssignedIdentityCredentials Aug 12, 2025
@openshift-ci-robot
Copy link
Contributor

@bryan-cox: This pull request references Jira Issue OCPBUGS-59734, which is invalid:

  • expected the bug to target the "4.20.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Use consistent azureCredentialsKey for both storing and loading cached credentials instead of mixing azureCredentialsKey and userAssignedIdentityCredentialsFilePath. This prevents repeated credential recreation and eliminates redundant log messages.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bryan-cox bryan-cox force-pushed the inotify-bug branch 2 times, most recently from 860c446 to e704b6b Compare August 12, 2025 13:13
@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox
Copy link
Member Author

/test hypershift-e2e-aks

@bryan-cox bryan-cox changed the title OCPBUGS-59734: fix(azure): fix credential caching key mismatch in UserAssignedIdentityCredentials OCPBUGS-59734: fix(azure): resolve credential caching issues around UAMI support Aug 12, 2025
@bryan-cox bryan-cox marked this pull request as ready for review August 12, 2025 20:43
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 12, 2025
@bryan-cox
Copy link
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Aug 12, 2025
@openshift-ci-robot
Copy link
Contributor

@bryan-cox: This pull request references Jira Issue OCPBUGS-59734, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@bryan-cox: This pull request references Jira Issue OCPBUGS-59734, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.20.0) matches configured target version for branch (4.20.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

No GitHub users were found matching the public email listed for the QA contact in Jira ([email protected]), skipping review request.

In response to this:

Summary

This PR fixes credential caching issues in Azure storage operations and adds caching support for User Assigned Managed Identity (UAMI) credentials.

Caching at the driver level was not enough so a global cache was introduced so that we are not getting new credentials over and over from Azure.

Changes

  • fix(azure): fix credential caching key mismatch in driver storageAccountsClient
    Resolves credential caching key inconsistencies in the storage accounts client

  • fix(azure): fix credential caching key mismatch in azureclient
    Fixes credential caching key mismatches in the Azure client implementation

  • feat(azure): add ensureUAMICredentials function with comprehensive tests

  • Adds new ensureUAMICredentials function to obtain and cache Azure TokenCredential using User Assigned Managed Identity (UAMI)

  • Function loads credentials from global cache or creates new ones with proper environment configuration

  • Comprehensive unit tests covering environment variable handling, cache behavior, and error scenarios

  • Tests follow table-driven pattern and use t.Setenv for proper environment handling

  • Updates Azure client instantiation to use UAMI credentials when available

Testing

  • Added comprehensive unit tests for the new ensureUAMICredentials function
  • Tests follow established codebase patterns and conventions

Impact

  • Improves credential management reliability in Azure storage operations
  • Enables proper UAMI support for managed Azure environments
  • No breaking changes to existing functionality

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

- Add ensureUAMICredentials function to obtain and cache Azure TokenCredential using User Assigned Managed Identity (UAMI)
- Function loads credentials from global cache or creates new ones with proper environment configuration
- Add comprehensive unit tests covering environment variable handling, cache behavior, and error scenarios
- Tests use table-driven pattern following codebase conventions and t.Setenv for proper environment handling
- Update Azure client instantiation to use UAMI credentials when available
Copy link
Contributor

openshift-ci bot commented Aug 12, 2025

@bryan-cox: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift-conformance 2cff080 link true /test e2e-hypershift-conformance
ci/prow/okd-scos-e2e-aws-ovn 2cff080 link false /test okd-scos-e2e-aws-ovn
ci/prow/verify 2cff080 link true /test verify
ci/prow/e2e-aws-ovn 2cff080 link true /test e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@bryan-cox
Copy link
Member Author

@flavianmissi this is ready for review now 😄

@bryan-cox
Copy link
Member Author

The logs show the credentials being stored once then loaded from the cache then on

'''
I0813 00:12:11.956008 1 azure.go:1376] Storing UAMI credentials to global cache
...

I0813 00:21:04.989721 1 azure.go:1348] Loaded UAMI credentials from cache
I0813 00:21:05.603952 1 azure.go:1348] Loaded UAMI credentials from cache
I0813 00:21:26.034294 1 azure.go:1348] Loaded UAMI credentials from cache
I0813 00:21:26.333995 1 azure.go:1348] Loaded UAMI credentials from cache
'''

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_cluster-image-registry-operator/1238/pull-ci-openshift-cluster-image-registry-operator-main-hypershift-e2e-aks/1955410821643243520/artifacts/hypershift-e2e-aks/hypershift-azure-run-e2e/artifacts/TestCreateCluster/namespaces/e2e-clusters-xhkq8-create-cluster-vb2c6/core/pods/logs/cluster-image-registry-operator-f599f47bd-qbwvt-cluster-image-registry-operator.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants