Skip to content

fangluguomsft/azure_rbac_latency

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Azure RBAC Propagation Latency Test

This tool measures the time it takes for an Azure Role Based Access Control (RBAC) assignment to become effective. It repeatedly creates a resource group, assigns a role to a service principal, and measures the duration until the service principal can successfully access the resource.

Prerequisites

  1. Go (1.18 or later) installed.
  2. Azure CLI (az) installed and logged in.
    az login
  3. An Azure Subscription where you have Owner or User Access Administrator permissions to create service principals and assign roles.

Usage

  1. (Optional) Set your Azure Subscription ID. If omitted, the tool automatically detects it via az account show.

    export AZURE_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
  2. (Optional) Set the number of test cycles (default is 100):

    export TOTAL_CYCLES=10
  3. Run the test using go test. Since 100 cycles can take a significant amount of time (latency varies from 30s to minutes), ensure you set a sufficient timeout.

    go test -v -timeout 120m
    • -v: Verbose output to see progress of each cycle.
    • -timeout 120m: Sets the test timeout to 2 hours (default is usually 10m).

Conclusion & Strategy

Based on test results, we observed the following behavior regarding Azure RBAC propagation and token caching:

  1. Old Token Efficacy: Deep down, Azure RBAC validation logic in the Resource Manager often evaluates effective permissions in real-time or near real-time against the backing store. Consequently, an existing access token --- even the one issued before a new role assignment --- will often work immediately once the assignment propagates in the backend system. You do not always need a fresh token to see new permissions.
  2. Token Refresh Necessity: However, there are edge cases or specific timing windows where the cached claims in an old token are insufficient, or the proprietary "ARM cache" requires a fresh token to re-evaluate access correctly. In these instances, relying solely on the old token results in persistent 403 Failed errors.
  3. Recommended Strategy:
    • Optimistic Attempt: Start by using the existing, cached access token to attempt the operation.
    • Retry with Fresh Token: If the operation fails with an authorization error (e.g., 403 Forbidden) after a few retries (e.g., 3 attempts), force a token refresh. Fetching a new token forces ARM to re-validate the identity's permissions against the latest state.

This hybrid approach minimizes unnecessary token issuance overhead (latency, AAD load) while providing robustness against RABC's poor cache performance.

Authentication Options

Option 1: Auto-Generated Service Principal (Default)

The tool will attempt to create a temporary Service Principal using az ad sp create-for-rbac. This requires your az login context to have permissions to create App Registrations and assign roles.

Note: If you encounter AADSTS530084 or other policy errors, try running az login to refresh your credentials or use Option 2.

Option 2: Existing Service Principal

If you cannot create a Service Principal due to policies, you can provide an existing one via environment variables:

export AZURE_CLIENT_ID="<app-id>"
export AZURE_CLIENT_SECRET="<password>"
export AZURE_TENANT_ID="<tenant-id>"

The tool will skip creation and use these credentials for the test.

How it Works

The test performs the following steps:

  1. Service Principal Setup:

    • Creates a temporary SP or uses the existing one provided.
    • If creating, waits 30 seconds for propagation.
  2. Measurement Loops:

    • Create Resource Group: Creates a unique Resource Group for the iteration.
    • Assign Role: Assigns the Reader role to the SP on the new Resource Group.
    • Measure Latency:
      • Immediately tries to access the Resource Group using the SP's credentials.
      • Retries every 2 seconds if access is denied (403 Forbidden).
      • Records the time elapsed between assignment and successful access.
    • Cleanup: Deletes the Resource Group.
  3. Teardown:

    • Deletes the temporary Service Principal (if created by the tool).
  4. Reporting:

    • Outputs the Minimum, Maximum, and Average latency observed across all successful cycles.

Troubleshooting

  • "az ad sp create-for-rbac failed": Ensure az login is successful and your user has permissions to create App Registrations. Run az login interactively if instructed by the error message.
  • Context Deadline Exceeded: Increase the -timeout flag value.

Manual Cleanup

If the test is interrupted, you may have leftover resource groups. You can delete them with:

az group list --query "[?starts_with(name, 'rbac-latency-test-rg-')].name" -o tsv | xargs -I {} az group delete --name {} --yes --no-wait

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages