Add Azure Kubernetes Service (AKS) hosting support by mitchdenny · Pull Request #16088 · microsoft/aspire

mitchdenny · 2026-04-12T11:25:53Z

Description

WIP — Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new Aspire.Hosting.Azure.Kubernetes package.

Motivation

Aspire's Aspire.Hosting.Kubernetes package supports end-to-end deployment to any conformant Kubernetes cluster via Helm charts, but it has no awareness of Azure-specific capabilities. Users who deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire.

What's here so far (Phase 1)

New Aspire.Hosting.Azure.Kubernetes package with dependencies on Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure
AzureKubernetesEnvironmentResource — unified resource that extends AzureProvisioningResource and implements IAzureComputeEnvironmentResource, internally wrapping a KubernetesEnvironmentResource for Helm deployment
AddAzureKubernetesEnvironment() entry point (mirrors AddAzureContainerAppEnvironment() pattern)
Configuration extensions: WithVersion, WithSkuTier, WithNodePool, AsPrivateCluster, WithContainerInsights, WithAzureLogAnalyticsWorkspace
AzureKubernetesInfrastructure eventing subscriber
Implementation spec at docs/specs/aks-support.md

What's planned next

Workload identity (federated credentials + ServiceAccount YAML generation)
VNet integration (WithDelegatedSubnet)
Full Bicep provisioning (pending Azure.Provisioning.ContainerService package availability in internal feeds)
Unit tests with Bicep snapshot verification
E2E deployment tests

Validation

Package builds successfully with dotnet build /p:SkipNativeBuild=true
Follows established patterns from Aspire.Hosting.Azure.AppContainers

Fixes # (issue)

Checklist

Is this feature complete?
- Yes. Ready to ship.
- No. Follow-up changes expected.
Are you including unit tests for the changes and scenario tests if relevant?
- Yes
- No
Did you add public API?
- Yes
  - If yes, did you have an API Review for it?
    - Yes
    - No
  - Did you add <remarks /> and <code /> elements on your triple slash comments?
    - Yes
    - No
- No
Does the change make any security assumptions or guarantees?
- Yes
- No
Does the change require an update in our Aspire docs?
- Yes
  - Link to aspire.dev issue:
    - New issue
- No

Create core implementation files for AKS hosting support: - AzureKubernetesEnvironmentResource: resource class with BicepOutputReference properties - AzureKubernetesEnvironmentExtensions: AddAzureKubernetesEnvironment and configuration methods - AzureKubernetesInfrastructure: eventing subscriber for compute resource processing - AksNodePoolConfig, AksSkuTier, AksNetworkProfile: supporting types - Project file with dependencies on Hosting.Azure, Hosting.Kubernetes, etc. - Add project to Aspire.slnx solution - Add InternalsVisibleTo in Aspire.Hosting.Kubernetes for internal API access Note: Azure.Provisioning.ContainerService package is not yet available in internal NuGet feeds. ConfigureAksInfrastructure uses placeholder outputs. When the package becomes available, replace with typed ContainerServiceManagedCluster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-12T11:26:08Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16088

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16088"

- Bicep snapshot verification tests - Configuration extension tests (version, SKU, node pools, private cluster) - Monitoring integration tests (Container Insights, Log Analytics) - Argument validation tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- AzureKubernetesEnvironmentResource now implements IAzureDelegatedSubnetResource and IAzureNspAssociationTarget for VNet and network perimeter integration - WithWorkloadIdentity() on AKS environment enables OIDC and workload identity - WithAzureWorkloadIdentity<T>() on compute resources for federated credential setup with auto-create identity support - AksWorkloadIdentityAnnotation for ServiceAccount YAML generation - AsExisting() works automatically via AzureProvisioningResource base class - Additional unit tests for all new functionality Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Changed WithNodePool to AddNodePool returning IResourceBuilder<AksNodePoolResource> - AksNodePoolResource is a child resource (IResourceWithParent) of AKS environment - WithNodePoolAffinity<T> extension lets compute resources target specific node pools - AksNodePoolAffinityAnnotation carries scheduling info for Helm chart nodeSelector - Made AksNodePoolConfig and AksNodePoolMode public (exposed via AksNodePoolResource) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When no user node pool is explicitly added via AddNodePool(), the AzureKubernetesInfrastructure subscriber creates a default 'workload' user pool (Standard_D4s_v5, 1-10 nodes) during BeforeStartEvent. Compute resources without explicit WithNodePoolAffinity() are automatically assigned to the first available user pool (either explicitly created or the auto-generated default). This ensures workloads are never scheduled on the system pool, which should only run system pods (kube-system, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tion Two issues with aspire publish: 1. No Helm chart output: The inner KubernetesEnvironmentResource was stored as a property but never added to the application model. KubernetesInfrastructure looks for KubernetesEnvironmentResource instances in the model to generate Helm charts. Fix: add the inner K8s environment to the model (excluded from manifest) with the default Helm engine. 2. Duplicate DeploymentTargetAnnotation: AzureKubernetesInfrastructure was adding its own DeploymentTargetAnnotation, conflicting with the one that KubernetesInfrastructure adds (which points to the correct KubernetesResource deployment target with Helm chart data). Fix: remove the duplicate annotation from our subscriber — KubernetesInfrastructure handles it. Also made EnsureDefaultHelmEngine internal (was private) so the AKS package can call it to set up the Helm deployment engine on the inner K8s environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Override GetBicepTemplateString and GetBicepTemplateFile to generate proper AKS ManagedCluster Bicep directly, bypassing the Azure.Provisioning SDK infrastructure (which requires the unavailable Azure.Provisioning.ContainerService package). The generated Bicep includes: - Microsoft.ContainerService/managedClusters resource with SystemAssigned identity - Configurable SKU tier, Kubernetes version, DNS prefix - Agent pool profiles with autoscaling from NodePools config - OIDC issuer profile and workload identity security profile - Optional private cluster API server access profile - Optional network profile (Azure CNI) - All outputs: id, name, clusterFqdn, oidcIssuerUrl, kubeletIdentityObjectId, nodeResourceGroup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Container Registry: - Auto-create a default Azure Container Registry when AddAzureKubernetesEnvironment is called (same pattern as Container Apps) - WithContainerRegistry() extension to use an explicit ACR, replacing the default - FlowContainerRegistry() in AzureKubernetesInfrastructure propagates the registry to the inner KubernetesEnvironmentResource via ContainerRegistryReferenceAnnotation so KubernetesInfrastructure can discover it for image push/pull Localhive fix: - Added SuppressFinalPackageVersion to csproj (required for new packages in Arcade) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Packages with SuppressFinalPackageVersion=true (like Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure.Kubernetes) are placed in the NonShipping output directory by Arcade SDK. The localhive script was only looking in the Shipping directory, causing these packages to be missing from the hive. Changes: - Added Get-AllPackagePaths that returns both Shipping and NonShipping dirs - Package collection now scans all available package directories - When packages span multiple directories, auto-uses copy mode (can't symlink to two dirs) - Single-dir case still uses symlink/junction for performance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-12T14:37:42Z

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Tests / Cli / Cli (macos-latest) - Ignored failed step 'Run tests (Linux/macOS)' matched the job-level infrastructure override allowlist.

davidfowl · 2026-04-12T15:28:33Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentExtensions.cs

+    /// <param name="builder">The resource builder.</param>
+    /// <param name="version">The Kubernetes version (e.g., "1.30").</param>
+    /// <returns>A reference to the <see cref="IResourceBuilder{AzureKubernetesEnvironmentResource}"/> for chaining.</returns>
+    [AspireExportIgnore(Reason = "AKS hosting is not yet supported in ATS")]


Will be addressed.

docs/specs/aks-support.md

Internal methods from Aspire.Hosting.Kubernetes (AddKubernetesInfrastructureCore, EnsureDefaultHelmEngine, KubernetesInfrastructure, HelmDeploymentEngine) are not accessible at runtime across NuGet package boundaries, even with InternalsVisibleTo set. The InternalsVisibleTo attribute only works at compile time with project references, not with signed NuGet packages. Fix: call the public AddKubernetesEnvironment() API instead. This handles all the internal setup (registering KubernetesInfrastructure subscriber, creating the resource, setting up Helm engine) through a single public entry point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The localhive.ps1 modifications were unnecessary - packages with SuppressFinalPackageVersion go to Shipping, not NonShipping. The package discovery issue was caused by running localhive from the wrong worktree, not a script problem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…placeholders The ConfigureAksInfrastructure callback was still adding ProvisioningOutput objects with no values, even though GetBicepTemplateString/GetBicepTemplateFile now generate the Bicep directly. While our overrides prevent these from being used for Bicep generation, the stale outputs could confuse the AzureResourcePreparer's parameter analysis. Emptied the callback body since all Bicep generation is handled by the resource's overrides. Also removed unused Azure.Provisioning using directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two changes to ensure Helm/kubectl target the AKS cluster instead of the user's default kubectl context: 1. KubernetesEnvironmentResource.KubeConfigPath (Aspire.Hosting.Kubernetes): New public property. When set, HelmDeploymentEngine passes --kubeconfig to all helm and kubectl commands. This is non-breaking — null means use default behavior. 2. AzureKubernetesInfrastructure get-credentials step (Aspire.Hosting.Azure.Kubernetes): Adds a pipeline step that runs after AKS Bicep provisioning and before Helm prepare. It calls 'az aks get-credentials --file <isolated-path>' to write credentials to a temp kubeconfig file, then sets KubeConfigPath on the inner KubernetesEnvironmentResource. This ensures Helm deploys to the provisioned AKS cluster without mutating ~/.kube/config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…meterResource The resourceGroupName is a ParameterResource that requires configuration key 'Parameters:resourceGroupName' to be set. During deploy, this isn't available as a raw parameter value — it's resolved by the Azure provisioning context and stored in the 'Azure:ResourceGroup' configuration key. Changed GetAksCredentialsAsync to read from IConfiguration['Azure:ResourceGroup'] which is populated by the Azure provisioner during context creation, before our get-credentials step runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…luster name BicepOutputReference.GetValueAsync() triggers parameter resolution on the AzureProvisioningResource, which tries to resolve the 'location' parameter that depends on 'resourceGroup().location'. In a fresh environment without Parameters:resourceGroupName configured, this fails. Since we set the cluster name directly in the Bicep template (name: '{Name}'), we can just use environment.Name as the cluster name. This avoids the parameter resolution chain entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The aks-get-credentials step was depending on provision-{name} (individual AKS resource step) but the Azure:ResourceGroup config key is set by the create-provisioning-context step. In a fresh environment, the step ordering wasn't guaranteed to have the config available. Changed to depend on the provision-azure-bicep-resources aggregation step which gates on ALL provisioning completing, ensuring both the provisioning context (with resource group) and the AKS cluster are ready. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…uring event The ContainerRegistryReferenceAnnotation was being added to the inner KubernetesEnvironmentResource during BeforeStartEvent via FlowContainerRegistry. But KubernetesInfrastructure also runs during BeforeStartEvent and reads the registry annotation — if it ran first, it wouldn't see the annotation, resulting in no push steps being created and images never getting pushed. Fix: Add the ContainerRegistryReferenceAnnotation to the inner K8s environment immediately in AddAzureKubernetesEnvironment and WithContainerRegistry, at resource creation time before any events fire. This guarantees KubernetesInfrastructure always sees the registry regardless of subscriber execution order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Push steps call registry.Endpoint.GetValueAsync() which awaits the BicepOutputReference for loginServer. If the ACR hasn't been provisioned yet, this blocks indefinitely — the push step just hangs after push-prereq. Push steps depend on build + push-prereq, but neither of those depend on the ACR's provision step. Added a PipelineConfigurationAnnotation on the inner K8s environment that makes all compute resource push steps depend on the ACR's provision step. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…step Changed from depending on individual ACR provision step (which required resource-to-step lookup that may not resolve correctly) to depending on the provision-azure-bicep-resources aggregation step by name. This is simpler and ensures ALL Azure provisioning (including ACR output population) completes before any image push begins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous attempts tried to wire dependencies via GetSteps(resource, tag) which uses the StepToResourceMap. This approach failed because the push steps are keyed to the compute resources, not the K8s environment. New approach: find the push-prereq step by name in the Steps collection and directly call DependsOn(provision-azure-bicep-resources). Since all push steps already depend on push-prereq, this ensures the entire push chain waits for Azure provisioning to complete. This mirrors how ACA works: ACA doesn't need this because it implements IContainerRegistry directly on the environment resource, so the endpoint values are resolved differently. For AKS, the ACR is a separate Bicep resource whose outputs need to be populated before push can proceed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Temporary console output to debug why push steps hang after push-prereq. Logs whether the PipelineConfigurationAnnotation runs, whether push-prereq is found, how many push steps exist, and their dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Diagnostics revealed push steps had EMPTY DependsOnSteps lists. The standard wiring from ProjectResource's PipelineConfigurationAnnotation (pushSteps.DependsOn(buildSteps, push-prereq)) wasn't working because context.GetSteps(resource, tag) returned empty — the resource lookup via ResourceNameComparer didn't match when K8s deployment targets are involved. Fix: directly find push steps by tag in the Steps collection and explicitly wire dependencies on: - provision-azure-bicep-resources (ACR must be provisioned for endpoint) - push-prereq (ACR login must complete) - build-{resourceName} (container image must be built) This ensures the correct execution order: provision → push-prereq → build → push → helm-deploy Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The Azure provisioning context internals (ProvisioningContextTask, AzureProvisionerOptions) are all internal to Aspire.Hosting.Azure and inaccessible from our package. IConfiguration['Azure:ResourceGroup'] is also not reliably set when our step runs because the deployment state manager writes to a different configuration scope. New approach: query Azure directly with 'az aks list --query' to find the cluster's resource group. This is guaranteed to work after provisioning completes, regardless of internal configuration state. The az CLI is already available (validated by validate-azure-login step). Also wires push step dependencies directly by finding steps by tag in the Steps collection, fixing the issue where push steps had empty DependsOnSteps lists in K8s compute environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… group - Quote --resource-group and --name values to handle special characters - Strip line endings from az aks list output to prevent argument parsing issues - Add logging of cluster name and resource group values for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The JMESPath query in 'az aks list --query [?name==...].resourceGroup' had quote-escaping issues on Windows when passed via ProcessStartInfo. The quotes in the JMESPath expression were being mangled by cmd.exe, producing truncated/malformed resource group names. Switched to 'az resource list --resource-type ... --name ... --query [0].resourceGroup' which uses --name as a proper CLI argument (no embedded quotes in JMESPath) and the simpler [0].resourceGroup query has no quote escaping issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The publish context and step factory used GetDeploymentTargetAnnotation(environment) where environment is the inner K8s env. But WithComputeEnvironment(aksEnv) sets the compute env to the AKS resource, and KubernetesInfrastructure now sets DeploymentTargetAnnotation.ComputeEnvironment to match the resource's actual compute env (the AKS resource, not the inner K8s env). Updated all GetDeploymentTargetAnnotation calls to use ParentComputeEnvironment when available, so the lookup matches correctly. Also fixed KubernetesInfrastructure to set ComputeEnvironment on the DeploymentTargetAnnotation to the resource's actual compute env rather than always using the inner K8s env. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

BicepOutputReference.ValueExpression uses single braces ({storage.outputs.blobEndpoint}) but ResolveUnknownValue only stripped double braces ({{ }}) via HelmExtensions delimiters. Single braces passed through to the Helm template, causing a parse error. Fix: also strip single { and } characters when sanitizing the values key. Fixes #16114 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-13T13:41:18Z

🎬 CLI E2E Test Recordings — 68 recordings uploaded (commit 0748329)

View recordings

Test	Recording
AddPackageInteractiveWhileAppHostRunningDetached	▶️ View Recording
AddPackageWhileAppHostRunningDetached	▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect	▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly	▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig	▶️ View Recording
AllPublishMethodsBuildDockerImages	▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps	▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps	▶️ View Recording
Banner_DisplayedOnFirstRun	▶️ View Recording
Banner_DisplayedWithExplicitFlag	▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag	▶️ View Recording
CertificatesClean_RemovesCertificates	▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate	▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate	▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat	▶️ View Recording
CreateAndRunAspireStarterProject	▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle	▶️ View Recording
CreateAndRunEmptyAppHostProject	▶️ View Recording
CreateAndRunJavaEmptyAppHostProject	▶️ View Recording
CreateAndRunJsReactProject	▶️ View Recording
CreateAndRunPythonReactProject	▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject	▶️ View Recording
CreateAndRunTypeScriptStarterProject	▶️ View Recording
CreateJavaAppHostWithViteApp	▶️ View Recording
CreateStartAndStopAspireProject	▶️ View Recording
CreateTypeScriptAppHostWithViteApp	▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces	▶️ View Recording
DeployK8sBasicApiService	▶️ View Recording
DeployK8sWithGarnet	▶️ View Recording
DeployK8sWithMongoDB	▶️ View Recording
DeployK8sWithMySql	▶️ View Recording
DeployK8sWithPostgres	▶️ View Recording
DeployK8sWithRabbitMQ	▶️ View Recording
DeployK8sWithRedis	▶️ View Recording
DeployK8sWithSqlServer	▶️ View Recording
DeployK8sWithValkey	▶️ View Recording
DeployTypeScriptAppToKubernetes	▶️ View Recording
DescribeCommandResolvesReplicaNames	▶️ View Recording
DescribeCommandShowsRunningResources	▶️ View Recording
DetachFormatJsonProducesValidJson	▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig	▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted	▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted	▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas	▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson	▶️ View Recording
GlobalMigration_PreservesAllValueTypes	▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists	▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat	▶️ View Recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot	▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun	▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath	▶️ View Recording
LogsCommandShowsResourceLogs	▶️ View Recording
PsCommandListsRunningAppHost	▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout	▶️ View Recording
PublishWithDockerComposeServiceCallbackSucceeds	▶️ View Recording
RestoreGeneratesSdkFiles	▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes	▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost	▶️ View Recording
SecretCrudOnDotNetAppHost	▶️ View Recording
SecretCrudOnTypeScriptAppHost	▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels	▶️ View Recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets	▶️ View Recording
StopAllAppHostsFromAppHostDirectory	▶️ View Recording
StopAllAppHostsFromUnrelatedDirectory	▶️ View Recording
StopNonInteractiveMultipleAppHostsShowsError	▶️ View Recording
StopNonInteractiveSingleAppHost	▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully	▶️ View Recording
UnAwaitedChainsCompileWithAutoResolvePromises	▶️ View Recording

_{📹 Recordings uploaded automatically from CI run #24345283063}

When WithDelegatedSubnet(subnet) is called, the generated AKS Bicep now: - Accepts a subnetId parameter wired from the subnet's ID output - Sets vnetSubnetID on all agent pool profiles - Configures Azure CNI network profile (networkPlugin: azure) with default service CIDR and DNS service IP This follows the ACA pattern where DelegatedSubnetAnnotation is read during infrastructure configuration and the subnet ID is passed as a provisioning parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

AKS requires plain (non-delegated) subnets for node pools. The previous approach used WithDelegatedSubnet which implements IAzureDelegatedSubnetResource and adds a 'Microsoft.ContainerService/managedClusters' service delegation to the subnet. Azure rejects this with 'SubnetIsDelegated' error. Changes: - Removed IAzureDelegatedSubnetResource from AzureKubernetesEnvironmentResource - Added WithSubnet(subnet) extension that stores the subnet ID without adding a service delegation (via AksSubnetAnnotation) - Added Aspire.Hosting.Azure.Network project reference for AzureSubnetResource - WithDelegatedSubnet still works as fallback but users should use WithSubnet Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

During publish, config values backed by IValueProvider (e.g., Bicep output references like {storage.outputs.blobEndpoint}) were written as raw expression strings to values.yaml. The prepare step only resolved ParameterResource values, leaving IValueProvider expressions unresolved. Fix (all in Aspire.Hosting.Kubernetes — no Azure dependency): 1. HelmValue.ValueProviderSource: new optional IValueProvider property set when ResolveUnknownValue detects the expression provider also implements IValueProvider 2. KubernetesPublishingContext: when a HelmValue has a ValueProviderSource, writes an empty placeholder and captures a CapturedHelmValueProvider for deploy-time resolution 3. KubernetesEnvironmentResource.CapturedHelmValueProvider: new record storing (Section, ResourceKey, ValueKey, IValueProvider) 4. HelmDeploymentEngine Phase 4: calls GetValueAsync() on captured IValueProvider entries to resolve values from external sources This is cloud-provider agnostic — works with any IValueProvider implementation (Azure Bicep outputs, AWS outputs, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three fixes: 1. Composite expressions (e.g., 'Endpoint={storage.outputs.blobEndpoint};ContainerName=photos') containing IValueProvider references (like BicepOutputReference) are now deferred for deploy-time resolution. Added IsUnresolvedAtPublishTime() check before processing sub-expressions — if any sub-expression would fall through to ResolveUnknownValue, the entire composite expression is captured as a CapturedHelmValueProvider. 2. Helm chart names are now scoped per AKS environment (e.g., 'k8stest5-corek8s' instead of 'k8stest5-apphost') to avoid conflicts when multiple environments deploy to the same cluster or when re-deploying with different environment names. 3. Updated K8s snapshot tests for the new deferred value handling. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ressions Previous check only detected direct IValueProvider references. For composite expressions like connection strings (Endpoint={storage.outputs.blobEndpoint};ContainerName=photos), the BicepOutputReference is nested inside a ReferenceExpression chain: ConnectionStringReference → ReferenceExpression → BicepOutputReference Fixed IsUnresolvedAtPublishTime to recursively check: - ConnectionStringReference → check inner ConnectionStringExpression - IResourceWithConnectionString → check inner ConnectionStringExpression - ReferenceExpression → check all value providers recursively - IManifestExpressionProvider + IValueProvider → true (leaf unresolvable) Also added early deferral in ProcessValueAsync for ConnectionStringReference and IResourceWithConnectionString before they get unwrapped. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…ed values The deferred value's Helm key was derived from ValueExpression which contains format strings like 'Endpoint={storage.outputs.blobEndpoint};ContainerName=photos'. This produced invalid Helm paths with = and ; characters. Fix: moved deferral check to ProcessEnvironmentAsync (outer loop) where the env var key name is available. CreateDeferredHelmValue now takes the env var key directly, producing clean paths like '{{ .Values.config.apiservice.ConnectionStrings__photos }}'. Removed deferral checks from ProcessValueAsync — all deferral is now handled before ProcessValueAsync is called. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add KubernetesNodePoolResource as base class in Aspire.Hosting.Kubernetes - Add KubernetesNodePoolAnnotation for nodeSelector scheduling - Add AddNodePool and WithNodePool extensions on K8s environment - AksNodePoolResource now extends KubernetesNodePoolResource - Remove WithNodePoolAffinity (replaced by WithNodePool) - Apply nodeSelector in KubernetesPublishingContext when annotation present - Delete AksNodePoolAffinityAnnotation (replaced by KubernetesNodePoolAnnotation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- AzureVmSizes.Generated.cs with common VM sizes grouped by family (GeneralPurpose, ComputeOptimized, MemoryOptimized, GpuAccelerated, StorageOptimized, Burstable, Arm) - GenVmSizes.cs tool that fetches VM SKUs from Azure REST API - update-azure-vm-sizes.yml workflow (monthly, like GitHub Models pattern) - Users can now write: aks.AddNodePool("gpu", AzureVmSizes.GpuAccelerated.StandardNC6sV3, 0, 5) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-04-14T08:51:02Z

Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
GitHub was asked to rerun all failed jobs for that attempt, and the rerun is being tracked in the rerun attempt.
The job links below point to the failed attempt jobs that matched the retry-safe transient failure rules.

Tests / Hosting-1 / Hosting-1 (windows-latest) - Job-level runner or infrastructure failure matched the transient allowlist.

- Add WithSubnet overload for IResourceBuilder<AksNodePoolResource> - Per-pool subnets generate separate Bicep params (subnetId_{poolName}) - Each pool uses its own subnet if set, else falls back to env-level default - Environment-level WithSubnet remains unchanged as the default - Network profile auto-configured when any subnet (default or per-pool) is set - Add 2 new tests for per-pool subnet scenarios Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove WithAzureWorkloadIdentity and AksWorkloadIdentityAnnotation - Honor AppIdentityAnnotation (same mechanism as ACA/AppService) - AzureKubernetesInfrastructure detects AppIdentityAnnotation and: - Enables OIDC + workload identity on the AKS cluster - Generates K8s ServiceAccount with azure.workload.identity/client-id - Sets serviceAccountName on pod spec - Adds azure.workload.identity/use pod label via customization annotation - Generate federated identity credential Bicep per workload identity - Add ServiceAccountV1 resource to Aspire.Hosting.Kubernetes - AKS admission controller injects AZURE_CLIENT_ID automatically Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The identity clientId was captured but never wired as a deferred Helm value, resulting in an empty azure.workload.identity/client-id annotation on the ServiceAccount. This caused pods to authenticate with the identity but lack the actual client ID, leading to 403 AuthorizationPermissionMismatch errors on Azure resources. Fix: Add identity.ClientId as a CapturedHelmValueProvider so it gets resolved from Bicep output at deploy time and written into the Helm override values under parameters.<name>.identityClientId. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The azure.workload.identity/use label was only added to the ServiceAccount but NOT to the pod template. The AKS workload identity admission webhook requires this label on the pod to inject AZURE_CLIENT_ID, AZURE_TENANT_ID, and token volume mounts. Without the pod label, the webhook doesn't fire and the pod authenticates with the default SA token instead of the federated identity, resulting in 403 AuthorizationPermissionMismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

More specific name that clarifies these are VM sizes for AKS node pools, not general Azure VM sizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mitchdenny

Code review: found 7 issues — 2 bugs (deadlock + missing exit code check), 1 security concern (credential file leak), 2 correctness issues (orphaned resources, redundant allocation), 1 behavioral concern (FindNodePoolResource identity), 1 documentation gap (region-locked VM sizes).

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs

+        process.Start();
+
+        var stdout = await process.StandardOutput.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false);
+        await process.StandardError.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false);


Bug: Potential deadlock — sequential ReadToEndAsync on stdout then stderr.

If the process writes enough to stderr to fill its buffer while stdout is being drained, this will deadlock. The correct pattern (already used in GetAksCredentialsAsync above) is to start both reads concurrently:

`csharp
var stdoutTask = process.StandardOutput.ReadToEndAsync(context.CancellationToken);
var stderrTask = process.StandardError.ReadToEndAsync(context.CancellationToken);

await process.WaitForExitAsync(context.CancellationToken).ConfigureAwait(false);

var stdout = await stdoutTask.ConfigureAwait(false);
var stderr = await stderrTask.ConfigureAwait(false);
`

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs

+        var stdout = await process.StandardOutput.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false);
+        await process.StandardError.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false);
+
+        await process.WaitForExitAsync(context.CancellationToken).ConfigureAwait(false);


Bug: Process exit code is never checked.

If �z resource list fails, stdout will be empty and the error thrown is the misleading "Could not resolve resource group" with no indication of the actual �z CLI error. The stderr output is read and discarded.

Check process.ExitCode after WaitForExitAsync (like GetAksCredentialsAsync does above) and include stderr in the error message.

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs

+    {
+        return new AksNodePoolResource(poolName,
+            environment.NodePools.First(p => p.Name == poolName),
+            environment);


Bug: FindNodePoolResource always creates a new instance instead of finding an existing one.

This method creates a
ew AksNodePoolResource(...) every time, so the node pool object assigned to workloads via KubernetesNodePoolAnnotation will always be a different object from any pool added via AddNodePool(). If any downstream code relies on reference equality or object identity between the annotation's pool and the pool added to the model, it will break silently.

Consider searching the app model's resources for an existing AksNodePoolResource matching the given name.

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs

+                    .ConfigureAwait(false);
+
+                // Write credentials to an isolated kubeconfig file
+                var kubeConfigDir = Directory.CreateTempSubdirectory("aspire-aks");


Security/Resource leak: Temp directory containing cluster credentials never cleaned up.

Directory.CreateTempSubdirectory("aspire-aks") creates a directory that persists after the pipeline completes. The kubeconfig file — which contains cluster credentials — remains on disk indefinitely. Consider registering cleanup (e.g., via IAsyncDisposable on the step or a inally block), or at minimum document that credentials persist.

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Kubernetes/KubernetesResource.cs

-        return new(helmExpression, parameter.ValueExpression);
+        var helmValue = new HelmValue(helmExpression, parameter.ValueExpression);
+
+        // If the expression provider also implements IValueProvider, attach it


Correctness: Redundant HelmValue allocation.

A HelmValue is created on line 681, then immediately discarded and a new one with identical constructor args is created here inside the if block. Simplify to:

csharp var helmValue = new HelmValue(helmExpression, parameter.ValueExpression) { ValueProviderSource = parameter as IValueProvider }; return helmValue;

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesInfrastructure.cs

+        var defaultConfig = new AksNodePoolConfig("workload", "Standard_D4s_v5", 1, 10, AksNodePoolMode.User);
+        environment.NodePools.Add(defaultConfig);
+
+        var defaultPool = new AksNodePoolResource("workload", defaultConfig, environment);


Bug: Default node pool is created but never added to the distributed application model.

When no user pool exists, a new AksNodePoolResource("workload", ...) is created and returned for use in annotations, but it's never registered via AddResource(). This means it won't appear in manifests, pipelines, or any resource enumeration — unlike pools created via AddNodePool() which do call �uilder.AddResource(). The Bicep generation adds the pool config to NodePools, but the resource object is orphaned.

mitchdenny · 2026-04-14T14:33:14Z

src/Aspire.Hosting.Azure.Kubernetes/tools/GenVmSizes.cs

+
+// Fetch resource SKUs filtered to virtualMachines
+var url = $"https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Compute/skus?api-version=2021-07-01&$filter=location eq 'eastus'";
+var json = await RunAzCommand($"rest --method get --url \"{url}\"").ConfigureAwait(false);


Correctness: VM size query is hardcoded to �astus region.

Some VM sizes are region-specific and may not be available in �astus, while others available in other regions will be missing from the generated constants. Consider documenting this limitation in the generated file's header comment, or querying across multiple representative regions to produce a more complete list.

- Mark implemented features (Phase 1-3, node pools, IValueProvider) - Update workload identity to document AppIdentityAnnotation approach - Update VNet to document WithSubnet (not WithDelegatedSubnet) - Document removed APIs (WithAzureWorkloadIdentity, AksWorkloadIdentityAnnotation) - List remaining gaps: monitoring Bicep, WithHelm/WithDashboard, AsExisting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

eerhardt · 2026-04-14T14:51:09Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs

+
+    private string GenerateAksBicep()
+    {
+        var sb = new StringBuilder();


Can we get Azure Provisoining for this?

eerhardt · 2026-04-14T14:52:29Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs

+            // Fallback: check for DelegatedSubnetAnnotation (legacy WithDelegatedSubnet usage)
+            hasDefaultSubnet = this.TryGetLastAnnotation<DelegatedSubnetAnnotation>(out var delegatedAnnotation);


If AKS doesn't use delegated subnets, we shouldn't support DelegatedSubnetAnnotation. There is no "legacy" here...

eerhardt · 2026-04-14T14:54:20Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs

+        }
+
+        // AKS cluster resource
+        sb.Append("resource ").Append(id).AppendLine(" 'Microsoft.ContainerService/managedClusters@2024-06-02-preview' = {");


The latest version is 2026-01-01. Can we use that? Is there a reason we are using a couple years old "preview" version?

eerhardt · 2026-04-14T14:57:18Z

src/Aspire.Hosting.Azure.Kubernetes/AzureKubernetesEnvironmentResource.cs

+            sb.AppendLine("      serviceCidr: '10.0.0.0/16'");
+            sb.AppendLine("      dnsServiceIP: '10.0.0.10'");


Are we OK with these being hard-coded?

eerhardt · 2026-04-14T14:59:27Z

src/Aspire.Hosting.Azure.Kubernetes/AksNodeVmSizes.Generated.cs

+/// This class is auto-generated. To update, run the GenVmSizes tool:
+/// <code>dotnet run --project src/Aspire.Hosting.Azure.Kubernetes/tools GenVmSizes.cs</code>
+/// </remarks>
+public static partial class AksNodeVmSizes


In other places, we've used Azure.Provisioning types for things like this that are defined by Azure.

mitchdenny and others added 2 commits April 12, 2026 21:24

Add AKS support implementation spec

3a70f2a

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mitchdenny and others added 8 commits April 12, 2026 21:31

davidfowl reviewed Apr 12, 2026

View reviewed changes

docs/specs/aks-support.md Show resolved Hide resolved

mitchdenny and others added 16 commits April 13, 2026 09:41

mitchdenny and others added 2 commits April 13, 2026 21:42

mitchdenny and others added 8 commits April 14, 2026 11:27

mitchdenny and others added 5 commits April 14, 2026 19:45

Rename AzureVmSizes to AksNodeVmSizes

b7f809b

More specific name that clarifies these are VM sizes for AKS node pools, not general Azure VM sizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

mitchdenny commented Apr 14, 2026

View reviewed changes

eerhardt reviewed Apr 14, 2026

View reviewed changes

		// Fallback: check for DelegatedSubnetAnnotation (legacy WithDelegatedSubnet usage)
		hasDefaultSubnet = this.TryGetLastAnnotation<DelegatedSubnetAnnotation>(out var delegatedAnnotation);

		sb.AppendLine(" serviceCidr: '10.0.0.0/16'");
		sb.AppendLine(" dnsServiceIP: '10.0.0.10'");

Conversation

mitchdenny commented Apr 12, 2026

Description

Motivation

What's here so far (Phase 1)

What's planned next

Validation

Checklist

Uh oh!

github-actions bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 12, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 14, 2026

Uh oh!

mitchdenny left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Apr 12, 2026 •

edited

Loading