Add Azure Kubernetes Service (AKS) hosting support#16088
Add Azure Kubernetes Service (AKS) hosting support#16088mitchdenny wants to merge 49 commits intomainfrom
Conversation
Create core implementation files for AKS hosting support: - AzureKubernetesEnvironmentResource: resource class with BicepOutputReference properties - AzureKubernetesEnvironmentExtensions: AddAzureKubernetesEnvironment and configuration methods - AzureKubernetesInfrastructure: eventing subscriber for compute resource processing - AksNodePoolConfig, AksSkuTier, AksNetworkProfile: supporting types - Project file with dependencies on Hosting.Azure, Hosting.Kubernetes, etc. - Add project to Aspire.slnx solution - Add InternalsVisibleTo in Aspire.Hosting.Kubernetes for internal API access Note: Azure.Provisioning.ContainerService package is not yet available in internal NuGet feeds. ConfigureAksInfrastructure uses placeholder outputs. When the package becomes available, replace with typed ContainerServiceManagedCluster. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16088Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16088" |
- Bicep snapshot verification tests - Configuration extension tests (version, SKU, node pools, private cluster) - Monitoring integration tests (Container Insights, Log Analytics) - Argument validation tests Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- AzureKubernetesEnvironmentResource now implements IAzureDelegatedSubnetResource and IAzureNspAssociationTarget for VNet and network perimeter integration - WithWorkloadIdentity() on AKS environment enables OIDC and workload identity - WithAzureWorkloadIdentity<T>() on compute resources for federated credential setup with auto-create identity support - AksWorkloadIdentityAnnotation for ServiceAccount YAML generation - AsExisting() works automatically via AzureProvisioningResource base class - Additional unit tests for all new functionality Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Changed WithNodePool to AddNodePool returning IResourceBuilder<AksNodePoolResource> - AksNodePoolResource is a child resource (IResourceWithParent) of AKS environment - WithNodePoolAffinity<T> extension lets compute resources target specific node pools - AksNodePoolAffinityAnnotation carries scheduling info for Helm chart nodeSelector - Made AksNodePoolConfig and AksNodePoolMode public (exposed via AksNodePoolResource) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When no user node pool is explicitly added via AddNodePool(), the AzureKubernetesInfrastructure subscriber creates a default 'workload' user pool (Standard_D4s_v5, 1-10 nodes) during BeforeStartEvent. Compute resources without explicit WithNodePoolAffinity() are automatically assigned to the first available user pool (either explicitly created or the auto-generated default). This ensures workloads are never scheduled on the system pool, which should only run system pods (kube-system, etc.). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tion Two issues with aspire publish: 1. No Helm chart output: The inner KubernetesEnvironmentResource was stored as a property but never added to the application model. KubernetesInfrastructure looks for KubernetesEnvironmentResource instances in the model to generate Helm charts. Fix: add the inner K8s environment to the model (excluded from manifest) with the default Helm engine. 2. Duplicate DeploymentTargetAnnotation: AzureKubernetesInfrastructure was adding its own DeploymentTargetAnnotation, conflicting with the one that KubernetesInfrastructure adds (which points to the correct KubernetesResource deployment target with Helm chart data). Fix: remove the duplicate annotation from our subscriber — KubernetesInfrastructure handles it. Also made EnsureDefaultHelmEngine internal (was private) so the AKS package can call it to set up the Helm deployment engine on the inner K8s environment. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Override GetBicepTemplateString and GetBicepTemplateFile to generate proper AKS ManagedCluster Bicep directly, bypassing the Azure.Provisioning SDK infrastructure (which requires the unavailable Azure.Provisioning.ContainerService package). The generated Bicep includes: - Microsoft.ContainerService/managedClusters resource with SystemAssigned identity - Configurable SKU tier, Kubernetes version, DNS prefix - Agent pool profiles with autoscaling from NodePools config - OIDC issuer profile and workload identity security profile - Optional private cluster API server access profile - Optional network profile (Azure CNI) - All outputs: id, name, clusterFqdn, oidcIssuerUrl, kubeletIdentityObjectId, nodeResourceGroup Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Container Registry: - Auto-create a default Azure Container Registry when AddAzureKubernetesEnvironment is called (same pattern as Container Apps) - WithContainerRegistry() extension to use an explicit ACR, replacing the default - FlowContainerRegistry() in AzureKubernetesInfrastructure propagates the registry to the inner KubernetesEnvironmentResource via ContainerRegistryReferenceAnnotation so KubernetesInfrastructure can discover it for image push/pull Localhive fix: - Added SuppressFinalPackageVersion to csproj (required for new packages in Arcade) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Packages with SuppressFinalPackageVersion=true (like Aspire.Hosting.Kubernetes and Aspire.Hosting.Azure.Kubernetes) are placed in the NonShipping output directory by Arcade SDK. The localhive script was only looking in the Shipping directory, causing these packages to be missing from the hive. Changes: - Added Get-AllPackagePaths that returns both Shipping and NonShipping dirs - Package collection now scans all available package directories - When packages span multiple directories, auto-uses copy mode (can't symlink to two dirs) - Single-dir case still uses symlink/junction for performance Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
| /// <param name="builder">The resource builder.</param> | ||
| /// <param name="version">The Kubernetes version (e.g., "1.30").</param> | ||
| /// <returns>A reference to the <see cref="IResourceBuilder{AzureKubernetesEnvironmentResource}"/> for chaining.</returns> | ||
| [AspireExportIgnore(Reason = "AKS hosting is not yet supported in ATS")] |
Internal methods from Aspire.Hosting.Kubernetes (AddKubernetesInfrastructureCore, EnsureDefaultHelmEngine, KubernetesInfrastructure, HelmDeploymentEngine) are not accessible at runtime across NuGet package boundaries, even with InternalsVisibleTo set. The InternalsVisibleTo attribute only works at compile time with project references, not with signed NuGet packages. Fix: call the public AddKubernetesEnvironment() API instead. This handles all the internal setup (registering KubernetesInfrastructure subscriber, creating the resource, setting up Helm engine) through a single public entry point. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The localhive.ps1 modifications were unnecessary - packages with SuppressFinalPackageVersion go to Shipping, not NonShipping. The package discovery issue was caused by running localhive from the wrong worktree, not a script problem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…placeholders The ConfigureAksInfrastructure callback was still adding ProvisioningOutput objects with no values, even though GetBicepTemplateString/GetBicepTemplateFile now generate the Bicep directly. While our overrides prevent these from being used for Bicep generation, the stale outputs could confuse the AzureResourcePreparer's parameter analysis. Emptied the callback body since all Bicep generation is handled by the resource's overrides. Also removed unused Azure.Provisioning using directive. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two changes to ensure Helm/kubectl target the AKS cluster instead of the user's default kubectl context: 1. KubernetesEnvironmentResource.KubeConfigPath (Aspire.Hosting.Kubernetes): New public property. When set, HelmDeploymentEngine passes --kubeconfig to all helm and kubectl commands. This is non-breaking — null means use default behavior. 2. AzureKubernetesInfrastructure get-credentials step (Aspire.Hosting.Azure.Kubernetes): Adds a pipeline step that runs after AKS Bicep provisioning and before Helm prepare. It calls 'az aks get-credentials --file <isolated-path>' to write credentials to a temp kubeconfig file, then sets KubeConfigPath on the inner KubernetesEnvironmentResource. This ensures Helm deploys to the provisioned AKS cluster without mutating ~/.kube/config. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…meterResource The resourceGroupName is a ParameterResource that requires configuration key 'Parameters:resourceGroupName' to be set. During deploy, this isn't available as a raw parameter value — it's resolved by the Azure provisioning context and stored in the 'Azure:ResourceGroup' configuration key. Changed GetAksCredentialsAsync to read from IConfiguration['Azure:ResourceGroup'] which is populated by the Azure provisioner during context creation, before our get-credentials step runs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…luster name
BicepOutputReference.GetValueAsync() triggers parameter resolution on
the AzureProvisioningResource, which tries to resolve the 'location'
parameter that depends on 'resourceGroup().location'. In a fresh
environment without Parameters:resourceGroupName configured, this fails.
Since we set the cluster name directly in the Bicep template (name: '{Name}'),
we can just use environment.Name as the cluster name. This avoids the
parameter resolution chain entirely.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The aks-get-credentials step was depending on provision-{name} (individual
AKS resource step) but the Azure:ResourceGroup config key is set by the
create-provisioning-context step. In a fresh environment, the step ordering
wasn't guaranteed to have the config available.
Changed to depend on the provision-azure-bicep-resources aggregation step
which gates on ALL provisioning completing, ensuring both the provisioning
context (with resource group) and the AKS cluster are ready.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…uring event The ContainerRegistryReferenceAnnotation was being added to the inner KubernetesEnvironmentResource during BeforeStartEvent via FlowContainerRegistry. But KubernetesInfrastructure also runs during BeforeStartEvent and reads the registry annotation — if it ran first, it wouldn't see the annotation, resulting in no push steps being created and images never getting pushed. Fix: Add the ContainerRegistryReferenceAnnotation to the inner K8s environment immediately in AddAzureKubernetesEnvironment and WithContainerRegistry, at resource creation time before any events fire. This guarantees KubernetesInfrastructure always sees the registry regardless of subscriber execution order. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Push steps call registry.Endpoint.GetValueAsync() which awaits the BicepOutputReference for loginServer. If the ACR hasn't been provisioned yet, this blocks indefinitely — the push step just hangs after push-prereq. Push steps depend on build + push-prereq, but neither of those depend on the ACR's provision step. Added a PipelineConfigurationAnnotation on the inner K8s environment that makes all compute resource push steps depend on the ACR's provision step. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…step Changed from depending on individual ACR provision step (which required resource-to-step lookup that may not resolve correctly) to depending on the provision-azure-bicep-resources aggregation step by name. This is simpler and ensures ALL Azure provisioning (including ACR output population) completes before any image push begins. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous attempts tried to wire dependencies via GetSteps(resource, tag) which uses the StepToResourceMap. This approach failed because the push steps are keyed to the compute resources, not the K8s environment. New approach: find the push-prereq step by name in the Steps collection and directly call DependsOn(provision-azure-bicep-resources). Since all push steps already depend on push-prereq, this ensures the entire push chain waits for Azure provisioning to complete. This mirrors how ACA works: ACA doesn't need this because it implements IContainerRegistry directly on the environment resource, so the endpoint values are resolved differently. For AKS, the ACR is a separate Bicep resource whose outputs need to be populated before push can proceed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Temporary console output to debug why push steps hang after push-prereq. Logs whether the PipelineConfigurationAnnotation runs, whether push-prereq is found, how many push steps exist, and their dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Diagnostics revealed push steps had EMPTY DependsOnSteps lists. The
standard wiring from ProjectResource's PipelineConfigurationAnnotation
(pushSteps.DependsOn(buildSteps, push-prereq)) wasn't working because
context.GetSteps(resource, tag) returned empty — the resource lookup
via ResourceNameComparer didn't match when K8s deployment targets are
involved.
Fix: directly find push steps by tag in the Steps collection and
explicitly wire dependencies on:
- provision-azure-bicep-resources (ACR must be provisioned for endpoint)
- push-prereq (ACR login must complete)
- build-{resourceName} (container image must be built)
This ensures the correct execution order:
provision → push-prereq → build → push → helm-deploy
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Azure provisioning context internals (ProvisioningContextTask, AzureProvisionerOptions) are all internal to Aspire.Hosting.Azure and inaccessible from our package. IConfiguration['Azure:ResourceGroup'] is also not reliably set when our step runs because the deployment state manager writes to a different configuration scope. New approach: query Azure directly with 'az aks list --query' to find the cluster's resource group. This is guaranteed to work after provisioning completes, regardless of internal configuration state. The az CLI is already available (validated by validate-azure-login step). Also wires push step dependencies directly by finding steps by tag in the Steps collection, fixing the issue where push steps had empty DependsOnSteps lists in K8s compute environments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… group - Quote --resource-group and --name values to handle special characters - Strip line endings from az aks list output to prevent argument parsing issues - Add logging of cluster name and resource group values for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The JMESPath query in 'az aks list --query [?name==...].resourceGroup' had quote-escaping issues on Windows when passed via ProcessStartInfo. The quotes in the JMESPath expression were being mangled by cmd.exe, producing truncated/malformed resource group names. Switched to 'az resource list --resource-type ... --name ... --query [0].resourceGroup' which uses --name as a proper CLI argument (no embedded quotes in JMESPath) and the simpler [0].resourceGroup query has no quote escaping issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The publish context and step factory used GetDeploymentTargetAnnotation(environment) where environment is the inner K8s env. But WithComputeEnvironment(aksEnv) sets the compute env to the AKS resource, and KubernetesInfrastructure now sets DeploymentTargetAnnotation.ComputeEnvironment to match the resource's actual compute env (the AKS resource, not the inner K8s env). Updated all GetDeploymentTargetAnnotation calls to use ParentComputeEnvironment when available, so the lookup matches correctly. Also fixed KubernetesInfrastructure to set ComputeEnvironment on the DeploymentTargetAnnotation to the resource's actual compute env rather than always using the inner K8s env. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BicepOutputReference.ValueExpression uses single braces ({storage.outputs.blobEndpoint})
but ResolveUnknownValue only stripped double braces ({{ }}) via HelmExtensions delimiters.
Single braces passed through to the Helm template, causing a parse error.
Fix: also strip single { and } characters when sanitizing the values key.
Fixes #16114
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🎬 CLI E2E Test Recordings — 68 recordings uploaded (commit View recordings
📹 Recordings uploaded automatically from CI run #24345283063 |
When WithDelegatedSubnet(subnet) is called, the generated AKS Bicep now: - Accepts a subnetId parameter wired from the subnet's ID output - Sets vnetSubnetID on all agent pool profiles - Configures Azure CNI network profile (networkPlugin: azure) with default service CIDR and DNS service IP This follows the ACA pattern where DelegatedSubnetAnnotation is read during infrastructure configuration and the subnet ID is passed as a provisioning parameter. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
AKS requires plain (non-delegated) subnets for node pools. The previous approach used WithDelegatedSubnet which implements IAzureDelegatedSubnetResource and adds a 'Microsoft.ContainerService/managedClusters' service delegation to the subnet. Azure rejects this with 'SubnetIsDelegated' error. Changes: - Removed IAzureDelegatedSubnetResource from AzureKubernetesEnvironmentResource - Added WithSubnet(subnet) extension that stores the subnet ID without adding a service delegation (via AksSubnetAnnotation) - Added Aspire.Hosting.Azure.Network project reference for AzureSubnetResource - WithDelegatedSubnet still works as fallback but users should use WithSubnet Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
During publish, config values backed by IValueProvider (e.g., Bicep output
references like {storage.outputs.blobEndpoint}) were written as raw
expression strings to values.yaml. The prepare step only resolved
ParameterResource values, leaving IValueProvider expressions unresolved.
Fix (all in Aspire.Hosting.Kubernetes — no Azure dependency):
1. HelmValue.ValueProviderSource: new optional IValueProvider property
set when ResolveUnknownValue detects the expression provider also
implements IValueProvider
2. KubernetesPublishingContext: when a HelmValue has a ValueProviderSource,
writes an empty placeholder and captures a CapturedHelmValueProvider
for deploy-time resolution
3. KubernetesEnvironmentResource.CapturedHelmValueProvider: new record
storing (Section, ResourceKey, ValueKey, IValueProvider)
4. HelmDeploymentEngine Phase 4: calls GetValueAsync() on captured
IValueProvider entries to resolve values from external sources
This is cloud-provider agnostic — works with any IValueProvider
implementation (Azure Bicep outputs, AWS outputs, etc.).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three fixes:
1. Composite expressions (e.g., 'Endpoint={storage.outputs.blobEndpoint};ContainerName=photos')
containing IValueProvider references (like BicepOutputReference) are now deferred for
deploy-time resolution. Added IsUnresolvedAtPublishTime() check before processing
sub-expressions — if any sub-expression would fall through to ResolveUnknownValue,
the entire composite expression is captured as a CapturedHelmValueProvider.
2. Helm chart names are now scoped per AKS environment (e.g., 'k8stest5-corek8s' instead of
'k8stest5-apphost') to avoid conflicts when multiple environments deploy to the same
cluster or when re-deploying with different environment names.
3. Updated K8s snapshot tests for the new deferred value handling.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ressions
Previous check only detected direct IValueProvider references. For
composite expressions like connection strings (Endpoint={storage.outputs.blobEndpoint};ContainerName=photos),
the BicepOutputReference is nested inside a ReferenceExpression chain:
ConnectionStringReference → ReferenceExpression → BicepOutputReference
Fixed IsUnresolvedAtPublishTime to recursively check:
- ConnectionStringReference → check inner ConnectionStringExpression
- IResourceWithConnectionString → check inner ConnectionStringExpression
- ReferenceExpression → check all value providers recursively
- IManifestExpressionProvider + IValueProvider → true (leaf unresolvable)
Also added early deferral in ProcessValueAsync for ConnectionStringReference
and IResourceWithConnectionString before they get unwrapped.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ed values
The deferred value's Helm key was derived from ValueExpression which
contains format strings like 'Endpoint={storage.outputs.blobEndpoint};ContainerName=photos'.
This produced invalid Helm paths with = and ; characters.
Fix: moved deferral check to ProcessEnvironmentAsync (outer loop) where
the env var key name is available. CreateDeferredHelmValue now takes the
env var key directly, producing clean paths like
'{{ .Values.config.apiservice.ConnectionStrings__photos }}'.
Removed deferral checks from ProcessValueAsync — all deferral is now
handled before ProcessValueAsync is called.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add KubernetesNodePoolResource as base class in Aspire.Hosting.Kubernetes - Add KubernetesNodePoolAnnotation for nodeSelector scheduling - Add AddNodePool and WithNodePool extensions on K8s environment - AksNodePoolResource now extends KubernetesNodePoolResource - Remove WithNodePoolAffinity (replaced by WithNodePool) - Apply nodeSelector in KubernetesPublishingContext when annotation present - Delete AksNodePoolAffinityAnnotation (replaced by KubernetesNodePoolAnnotation) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- AzureVmSizes.Generated.cs with common VM sizes grouped by family
(GeneralPurpose, ComputeOptimized, MemoryOptimized, GpuAccelerated,
StorageOptimized, Burstable, Arm)
- GenVmSizes.cs tool that fetches VM SKUs from Azure REST API
- update-azure-vm-sizes.yml workflow (monthly, like GitHub Models pattern)
- Users can now write: aks.AddNodePool("gpu", AzureVmSizes.GpuAccelerated.StandardNC6sV3, 0, 5)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
- Add WithSubnet overload for IResourceBuilder<AksNodePoolResource>
- Per-pool subnets generate separate Bicep params (subnetId_{poolName})
- Each pool uses its own subnet if set, else falls back to env-level default
- Environment-level WithSubnet remains unchanged as the default
- Network profile auto-configured when any subnet (default or per-pool) is set
- Add 2 new tests for per-pool subnet scenarios
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove WithAzureWorkloadIdentity and AksWorkloadIdentityAnnotation - Honor AppIdentityAnnotation (same mechanism as ACA/AppService) - AzureKubernetesInfrastructure detects AppIdentityAnnotation and: - Enables OIDC + workload identity on the AKS cluster - Generates K8s ServiceAccount with azure.workload.identity/client-id - Sets serviceAccountName on pod spec - Adds azure.workload.identity/use pod label via customization annotation - Generate federated identity credential Bicep per workload identity - Add ServiceAccountV1 resource to Aspire.Hosting.Kubernetes - AKS admission controller injects AZURE_CLIENT_ID automatically Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The identity clientId was captured but never wired as a deferred Helm value, resulting in an empty azure.workload.identity/client-id annotation on the ServiceAccount. This caused pods to authenticate with the identity but lack the actual client ID, leading to 403 AuthorizationPermissionMismatch errors on Azure resources. Fix: Add identity.ClientId as a CapturedHelmValueProvider so it gets resolved from Bicep output at deploy time and written into the Helm override values under parameters.<name>.identityClientId. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The azure.workload.identity/use label was only added to the ServiceAccount but NOT to the pod template. The AKS workload identity admission webhook requires this label on the pod to inject AZURE_CLIENT_ID, AZURE_TENANT_ID, and token volume mounts. Without the pod label, the webhook doesn't fire and the pod authenticates with the default SA token instead of the federated identity, resulting in 403 AuthorizationPermissionMismatch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
More specific name that clarifies these are VM sizes for AKS node pools, not general Azure VM sizes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
mitchdenny
left a comment
There was a problem hiding this comment.
Code review: found 7 issues — 2 bugs (deadlock + missing exit code check), 1 security concern (credential file leak), 2 correctness issues (orphaned resources, redundant allocation), 1 behavioral concern (FindNodePoolResource identity), 1 documentation gap (region-locked VM sizes).
| process.Start(); | ||
|
|
||
| var stdout = await process.StandardOutput.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false); | ||
| await process.StandardError.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Bug: Potential deadlock — sequential ReadToEndAsync on stdout then stderr.
If the process writes enough to stderr to fill its buffer while stdout is being drained, this will deadlock. The correct pattern (already used in GetAksCredentialsAsync above) is to start both reads concurrently:
`csharp
var stdoutTask = process.StandardOutput.ReadToEndAsync(context.CancellationToken);
var stderrTask = process.StandardError.ReadToEndAsync(context.CancellationToken);
await process.WaitForExitAsync(context.CancellationToken).ConfigureAwait(false);
var stdout = await stdoutTask.ConfigureAwait(false);
var stderr = await stderrTask.ConfigureAwait(false);
`
| var stdout = await process.StandardOutput.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false); | ||
| await process.StandardError.ReadToEndAsync(context.CancellationToken).ConfigureAwait(false); | ||
|
|
||
| await process.WaitForExitAsync(context.CancellationToken).ConfigureAwait(false); |
There was a problem hiding this comment.
Bug: Process exit code is never checked.
If �z resource list fails, stdout will be empty and the error thrown is the misleading "Could not resolve resource group" with no indication of the actual �z CLI error. The stderr output is read and discarded.
Check process.ExitCode after WaitForExitAsync (like GetAksCredentialsAsync does above) and include stderr in the error message.
| { | ||
| return new AksNodePoolResource(poolName, | ||
| environment.NodePools.First(p => p.Name == poolName), | ||
| environment); |
There was a problem hiding this comment.
Bug: FindNodePoolResource always creates a new instance instead of finding an existing one.
This method creates a
ew AksNodePoolResource(...) every time, so the node pool object assigned to workloads via KubernetesNodePoolAnnotation will always be a different object from any pool added via AddNodePool(). If any downstream code relies on reference equality or object identity between the annotation's pool and the pool added to the model, it will break silently.
Consider searching the app model's resources for an existing AksNodePoolResource matching the given name.
| .ConfigureAwait(false); | ||
|
|
||
| // Write credentials to an isolated kubeconfig file | ||
| var kubeConfigDir = Directory.CreateTempSubdirectory("aspire-aks"); |
There was a problem hiding this comment.
Security/Resource leak: Temp directory containing cluster credentials never cleaned up.
Directory.CreateTempSubdirectory("aspire-aks") creates a directory that persists after the pipeline completes. The kubeconfig file — which contains cluster credentials — remains on disk indefinitely. Consider registering cleanup (e.g., via IAsyncDisposable on the step or a inally block), or at minimum document that credentials persist.
| return new(helmExpression, parameter.ValueExpression); | ||
| var helmValue = new HelmValue(helmExpression, parameter.ValueExpression); | ||
|
|
||
| // If the expression provider also implements IValueProvider, attach it |
There was a problem hiding this comment.
Correctness: Redundant HelmValue allocation.
A HelmValue is created on line 681, then immediately discarded and a new one with identical constructor args is created here inside the if block. Simplify to:
csharp var helmValue = new HelmValue(helmExpression, parameter.ValueExpression) { ValueProviderSource = parameter as IValueProvider }; return helmValue;
| var defaultConfig = new AksNodePoolConfig("workload", "Standard_D4s_v5", 1, 10, AksNodePoolMode.User); | ||
| environment.NodePools.Add(defaultConfig); | ||
|
|
||
| var defaultPool = new AksNodePoolResource("workload", defaultConfig, environment); |
There was a problem hiding this comment.
Bug: Default node pool is created but never added to the distributed application model.
When no user pool exists, a new AksNodePoolResource("workload", ...) is created and returned for use in annotations, but it's never registered via AddResource(). This means it won't appear in manifests, pipelines, or any resource enumeration — unlike pools created via AddNodePool() which do call �uilder.AddResource(). The Bicep generation adds the pool config to NodePools, but the resource object is orphaned.
|
|
||
| // Fetch resource SKUs filtered to virtualMachines | ||
| var url = $"https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.Compute/skus?api-version=2021-07-01&$filter=location eq 'eastus'"; | ||
| var json = await RunAzCommand($"rest --method get --url \"{url}\"").ConfigureAwait(false); |
There was a problem hiding this comment.
Correctness: VM size query is hardcoded to �astus region.
Some VM sizes are region-specific and may not be available in �astus, while others available in other regions will be missing from the generated constants. Consider documenting this limitation in the generated file's header comment, or querying across multiple representative regions to produce a more complete list.
- Mark implemented features (Phase 1-3, node pools, IValueProvider) - Update workload identity to document AppIdentityAnnotation approach - Update VNet to document WithSubnet (not WithDelegatedSubnet) - Document removed APIs (WithAzureWorkloadIdentity, AksWorkloadIdentityAnnotation) - List remaining gaps: monitoring Bicep, WithHelm/WithDashboard, AsExisting Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
|
||
| private string GenerateAksBicep() | ||
| { | ||
| var sb = new StringBuilder(); |
There was a problem hiding this comment.
Can we get Azure Provisoining for this?
| // Fallback: check for DelegatedSubnetAnnotation (legacy WithDelegatedSubnet usage) | ||
| hasDefaultSubnet = this.TryGetLastAnnotation<DelegatedSubnetAnnotation>(out var delegatedAnnotation); |
There was a problem hiding this comment.
If AKS doesn't use delegated subnets, we shouldn't support DelegatedSubnetAnnotation. There is no "legacy" here...
| } | ||
|
|
||
| // AKS cluster resource | ||
| sb.Append("resource ").Append(id).AppendLine(" 'Microsoft.ContainerService/managedClusters@2024-06-02-preview' = {"); |
There was a problem hiding this comment.
The latest version is 2026-01-01. Can we use that? Is there a reason we are using a couple years old "preview" version?
| sb.AppendLine(" serviceCidr: '10.0.0.0/16'"); | ||
| sb.AppendLine(" dnsServiceIP: '10.0.0.10'"); |
There was a problem hiding this comment.
Are we OK with these being hard-coded?
| /// This class is auto-generated. To update, run the GenVmSizes tool: | ||
| /// <code>dotnet run --project src/Aspire.Hosting.Azure.Kubernetes/tools GenVmSizes.cs</code> | ||
| /// </remarks> | ||
| public static partial class AksNodeVmSizes |
There was a problem hiding this comment.
In other places, we've used Azure.Provisioning types for things like this that are defined by Azure.
Description
WIP — Adds first-class Azure Kubernetes Service (AKS) support to Aspire via a new
Aspire.Hosting.Azure.Kubernetespackage.Motivation
Aspire's
Aspire.Hosting.Kubernetespackage supports end-to-end deployment to any conformant Kubernetes cluster via Helm charts, but it has no awareness of Azure-specific capabilities. Users who deploy to AKS must manually provision the cluster, configure workload identity, set up monitoring, and manage networking outside of Aspire.What's here so far (Phase 1)
Aspire.Hosting.Azure.Kubernetespackage with dependencies onAspire.Hosting.KubernetesandAspire.Hosting.AzureAzureKubernetesEnvironmentResource— unified resource that extendsAzureProvisioningResourceand implementsIAzureComputeEnvironmentResource, internally wrapping aKubernetesEnvironmentResourcefor Helm deploymentAddAzureKubernetesEnvironment()entry point (mirrorsAddAzureContainerAppEnvironment()pattern)WithVersion,WithSkuTier,WithNodePool,AsPrivateCluster,WithContainerInsights,WithAzureLogAnalyticsWorkspaceAzureKubernetesInfrastructureeventing subscriberdocs/specs/aks-support.mdWhat's planned next
WithDelegatedSubnet)Azure.Provisioning.ContainerServicepackage availability in internal feeds)Validation
dotnet build /p:SkipNativeBuild=trueAspire.Hosting.Azure.AppContainersFixes # (issue)
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: