Skip to content

Commit ab4cebc

Browse files
mitchdennyMitch Denny
andauthored
AKS E2E tests: Redis variant, port fix, and reliability improvements (dotnet#14371)
* Add AKS starter deployment E2E test (Phase 1) This adds a new end-to-end deployment test that validates Azure Kubernetes Service (AKS) infrastructure creation: - Creates resource group, ACR, and AKS cluster - Configures kubectl credentials - Verifies cluster connectivity - Cleans up resources after test Phase 1 focuses on infrastructure only - Aspire deployment will be added in subsequent phases. * Fix AKS test: register required resource providers Add step to register Microsoft.ContainerService and Microsoft.ContainerRegistry resource providers before attempting to create AKS resources. This fixes the MissingSubscriptionRegistration error when the subscription hasn't been configured for AKS usage. * Fix AKS test: use Standard_B2s_v2 VM size The subscription in westus3 doesn't have access to Standard_B2s, only the v2 series VMs. Changed to Standard_B2s_v2 which is available. * Fix AKS test: use Standard_D2s_v3 VM size The subscription has zero quota for B-series VMs in westus3. Changed to Standard_D2s_v3 which is a widely-available D-series VM with typical quota. * Add Phase 2 & 3: Aspire project creation, Helm chart generation, and AKS deployment Phase 2 additions: - Create Aspire starter project using 'aspire new' - Add Aspire.Hosting.Kubernetes package via 'aspire add' - Modify AppHost.cs to call AddKubernetesEnvironment() with ACR config - Login to ACR for Docker image push - Run 'aspire publish' to generate Helm charts and push images Phase 3 additions: - Deploy Helm chart to AKS using 'helm install' - Verify pods are running with kubectl - Verify deployments are healthy This completes the full end-to-end flow: AKS cluster creation -> Aspire project creation -> Helm chart generation -> Deployment to Kubernetes * Fix Kubernetes deployment: Add container build/push step Changes: - Remove invalid ContainerRegistry property from AddKubernetesEnvironment - Add pragma warning disable for experimental ASPIREPIPELINES001 - Add container build step using dotnet publish /t:PublishContainer - Push container images to ACR before Helm deployment - Override Helm image values with ACR image references The Kubernetes publisher generates Helm charts but doesn't build containers. We need to build and push containers separately using dotnet publish. * Fix duplicate Service ports in Kubernetes publisher When multiple endpoints resolve to the same port number, the Service manifest generator was creating duplicate port entries, which Kubernetes rejects as invalid. This fix deduplicates ports by (port, protocol) before adding them to the Service spec. Fixes the error: Service 'xxx-service' is invalid: spec.ports[1]: Duplicate value * Add explicit AKS-ACR attachment verification step Added Step 6 to explicitly run 'az aks update --attach-acr' after AKS cluster creation to ensure the AcrPull role assignment has properly propagated. This addresses potential image pull permission issues where AKS cannot pull images from the attached ACR. Also renumbered all subsequent steps to maintain proper ordering. * Fix AKS image pull: correct Helm value paths and add ACR check * Fix duplicate Service/container ports: compare underlying values not Helm expressions * Re-enable AppService deployment tests * Add endpoint verification via kubectl port-forward to AKS test * Wait for pods to be ready before port-forward verification * Use retry loop for health endpoint verification and log HTTP status codes * Use real app endpoints: /weatherforecast and / instead of /health * Improve comments explaining duplicate port dedup rationale * Refactor cleanup to async pattern matching other deployment tests * Fix duplicate K8s ports: skip DefaultHttpsEndpoint in ProcessEndpoints The Kubernetes publisher was generating duplicate Service/container ports (both 8080/TCP) for ProjectResources with default http+https endpoints. The root cause is that GenerateDefaultProjectEndpointMapping assigns the same default port 8080 to every endpoint with None target port. The proper fix mirrors the core framework's SetBothPortsEnvVariables() behavior: skip the DefaultHttpsEndpoint (which the container won't listen on — TLS termination happens at ingress/service mesh). The https endpoint still gets an EndpointMapping (for service discovery) but reuses the http endpoint's HelmValue, so no duplicate K8s port is generated. Added Aspire.Hosting.Kubernetes to InternalsVisibleTo to access ProjectResource.DefaultHttpsEndpoint. The downstream dedup in ToService() and WithContainerPorts() remains as defense-in-depth. Fixes dotnet#14029 * Add AKS + Redis E2E deployment test Validates the Aspire starter template with Redis cache enabled deploys correctly to AKS. Exercises the full pipeline: webfrontend → apiservice → Redis by hitting the /weather page (SSR, uses Redis output caching). Key differences from the base AKS test: - Selects 'Yes' for Redis Cache in aspire new prompts - Redis uses public container image (no ACR push needed) - Verifies /weather page content (confirms Redis integration works) * Fix ACR name collision between parallel AKS tests Both AKS tests generated the same ACR name from RunId+RunAttempt. Use different prefixes (acrs/acrr) to ensure uniqueness. * Fix Redis Helm deployment: provide missing cross-resource secret value Work around K8s publisher bug where cross-resource secret references create Helm value paths under the consuming resource instead of referencing the owning resource's secret. The webfrontend template expects secrets.webfrontend.cache_password but values.yaml only has secrets.cache.REDIS_PASSWORD. Provide the missing value via --set. * Move ACR login before AKS creation to avoid OIDC token expiration The OIDC federated token expires after ~5 minutes, but AKS cluster creation takes 10-15 minutes. By the time the test reaches az acr login, the assertion is stale. Moving ACR auth to right after ACR creation ensures the OIDC token is still fresh, and Docker credentials persist in ~/.docker/config.json for later use. * Add pod diagnostics for Redis test: accept kubectl wait failure gracefully The kubectl wait step was blocking the test when Redis pods failed to start. Now accepts either OK or ERR exit, captures pod logs for diagnostics, and continues to verify what we can. * Wait only for project resource pods, skip Redis (K8s publisher bug dotnet#14370) Redis container crashes with 'cannot open redis-server: No such file' due to incorrect container command generated by the K8s publisher. The webfrontend handles Redis being unavailable gracefully. Wait only for apiservice and webfrontend pods using label selectors, and capture Redis pod logs for diagnostics. * Remove redundant weather page grep check (Blazor SSR streaming issue) The curl -sf /weather | grep 'Weather' step fails because Blazor SSR streaming rendering returns incomplete initial HTML via curl. The /weather endpoint already returns 200 (verified in previous step), which is sufficient to confirm the full pipeline works. * Add --max-time 10 to /weather curl (Blazor SSR streaming keeps connection open) Blazor SSR streaming rendering keeps the HTTP connection open to stream updates to the browser. curl waits indefinitely for the response to complete, causing the WaitForSuccessPrompt to time out. Adding --max-time ensures curl returns after receiving the initial 200 status code. * Fix /weather curl: capture status code in variable to handle SSR streaming curl --max-time exits with code 28 (timeout) even when HTTP 200 was received, because Blazor SSR streaming keeps the connection open. This causes the && chain to fail, so echo/break never execute. Fix by using semicolons and capturing the status code in a variable, then checking it explicitly with [ "$S" = "200" ]. * Fix K8s publisher: set ExecutionContext on CommandLineArgsCallbackContext The K8s publisher was not setting ExecutionContext when creating the CommandLineArgsCallbackContext in ProcessArgumentsAsync, causing it to default to Run mode. This made Redis's WithArgs callback produce individual args instead of a single -c shell command string, resulting in '/bin/sh redis-server' (open as script) instead of '/bin/sh -c "redis-server ..."' (execute as command). Matches the Docker Compose publisher which correctly sets ExecutionContext = executionContext. Also updates the Redis E2E test to wait for all pods (including cache) and verify Redis responds to PING. * Avoid leaking Redis password in test logs Expand $REDIS_PASSWORD inside the container shell instead of extracting it from the K8s secret on the host. Also use --no-auth-warning to suppress redis-cli's password-on-command-line warning. * Replace kubectl exec redis-cli with pod status check to avoid container name issue * Set REDIS_PASSWORD in helm install to prevent redis-server --requirepass with empty arg * Fix Helm secret path: use parameter name (cache_password) not env var key (REDIS_PASSWORD) The K8s publisher AllocateParameter creates Helm expressions using the parameter name (cache-password -> cache_password), but AddValuesToHelmSectionAsync writes values using the env var key (REDIS_PASSWORD). The template references .Values.secrets.cache.cache_password but values.yaml has REDIS_PASSWORD, so the password is always empty and Redis crashes with 'requirepass' having no argument. * Use dynamically generated GUID for Redis password instead of hardcoded value * Pass same Redis password to webfrontend so it can authenticate to Redis cache --------- Co-authored-by: Mitch Denny <mitch@mitchdeny.com>
1 parent a6b68c7 commit ab4cebc

File tree

3 files changed

+55
-34
lines changed

3 files changed

+55
-34
lines changed

src/Aspire.Hosting.Kubernetes/KubernetesResource.cs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,10 @@ private async Task ProcessArgumentsAsync(KubernetesEnvironmentContext environmen
287287
{
288288
if (resource.TryGetAnnotationsOfType<CommandLineArgsCallbackAnnotation>(out var commandLineArgsCallbackAnnotations))
289289
{
290-
var context = new CommandLineArgsCallbackContext([], resource, cancellationToken: cancellationToken);
290+
var context = new CommandLineArgsCallbackContext([], resource, cancellationToken: cancellationToken)
291+
{
292+
ExecutionContext = executionContext
293+
};
291294

292295
foreach (var c in commandLineArgsCallbackAnnotations)
293296
{

tests/Aspire.Deployment.EndToEnd.Tests/AksStarterDeploymentTests.cs

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,15 @@ private async Task DeployStarterTemplateToAksCore(CancellationToken cancellation
139139
.Enter()
140140
.WaitForSuccessPrompt(counter, TimeSpan.FromMinutes(3));
141141

142+
// Step 4b: Login to ACR immediately (before AKS creation which takes 10-15 min).
143+
// The OIDC federated token expires after ~5 minutes, so we must authenticate with
144+
// ACR while it's still fresh. Docker credentials persist in ~/.docker/config.json.
145+
output.WriteLine("Step 4b: Logging into Azure Container Registry (early, before token expires)...");
146+
sequenceBuilder
147+
.Type($"az acr login --name {acrName}")
148+
.Enter()
149+
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
150+
142151
// Step 5: Create AKS cluster with ACR attached
143152
// Using minimal configuration: 1 node, Standard_D2s_v3 (widely available with quota)
144153
output.WriteLine("Step 5: Creating AKS cluster (this may take 10-15 minutes)...");
@@ -274,12 +283,8 @@ private async Task DeployStarterTemplateToAksCore(CancellationToken cancellation
274283
.Enter()
275284
.WaitForSuccessPrompt(counter);
276285

277-
// Step 16: Login to ACR for Docker push
278-
output.WriteLine("Step 16: Logging into Azure Container Registry...");
279-
sequenceBuilder
280-
.Type($"az acr login --name {acrName}")
281-
.Enter()
282-
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
286+
// Step 16: ACR login was already done in Step 4b (before AKS creation).
287+
// Docker credentials persist in ~/.docker/config.json.
283288

284289
// Step 17: Build and push container images to ACR
285290
// The starter template creates webfrontend and apiservice projects

tests/Aspire.Deployment.EndToEnd.Tests/AksStarterWithRedisDeploymentTests.cs

Lines changed: 40 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ private async Task DeployStarterTemplateWithRedisToAksCore(CancellationToken can
5757
// Generate unique names for Azure resources
5858
var resourceGroupName = DeploymentE2ETestHelpers.GenerateResourceGroupName("aksredis");
5959
var clusterName = $"aks-{DeploymentE2ETestHelpers.GetRunId()}-{DeploymentE2ETestHelpers.GetRunAttempt()}";
60+
var redisPassword = Guid.NewGuid().ToString("N");
6061
// ACR names must be alphanumeric only, 5-50 chars, globally unique
6162
var acrName = $"acrr{DeploymentE2ETestHelpers.GetRunId()}{DeploymentE2ETestHelpers.GetRunAttempt()}".ToLowerInvariant();
6263
acrName = new string(acrName.Where(char.IsLetterOrDigit).Take(50).ToArray());
@@ -139,6 +140,15 @@ private async Task DeployStarterTemplateWithRedisToAksCore(CancellationToken can
139140
.Enter()
140141
.WaitForSuccessPrompt(counter, TimeSpan.FromMinutes(3));
141142

143+
// Step 4b: Login to ACR immediately (before AKS creation which takes 10-15 min).
144+
// The OIDC federated token expires after ~5 minutes, so we must authenticate with
145+
// ACR while it's still fresh. Docker credentials persist in ~/.docker/config.json.
146+
output.WriteLine("Step 4b: Logging into Azure Container Registry (early, before token expires)...");
147+
sequenceBuilder
148+
.Type($"az acr login --name {acrName}")
149+
.Enter()
150+
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
151+
142152
// Step 5: Create AKS cluster with ACR attached
143153
output.WriteLine("Step 5: Creating AKS cluster (this may take 10-15 minutes)...");
144154
sequenceBuilder
@@ -271,12 +281,8 @@ private async Task DeployStarterTemplateWithRedisToAksCore(CancellationToken can
271281
.Enter()
272282
.WaitForSuccessPrompt(counter);
273283

274-
// Step 16: Login to ACR for Docker push
275-
output.WriteLine("Step 16: Logging into Azure Container Registry...");
276-
sequenceBuilder
277-
.Type($"az acr login --name {acrName}")
278-
.Enter()
279-
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
284+
// Step 16: ACR login was already done in Step 4b (before AKS creation).
285+
// Docker credentials persist in ~/.docker/config.json.
280286

281287
// Step 17: Build and push container images to ACR
282288
// Only project resources need to be built — Redis uses a public container image
@@ -331,25 +337,38 @@ private async Task DeployStarterTemplateWithRedisToAksCore(CancellationToken can
331337

332338
// Step 21: Deploy Helm chart to AKS with ACR image overrides
333339
// Only project resources need image overrides — Redis uses the public image from the chart
334-
// Note: secrets.webfrontend.cache_password is a workaround for a K8s publisher bug where
335-
// cross-resource secret references create Helm value paths under the consuming resource
336-
// instead of referencing the owning resource's secret path (secrets.cache.REDIS_PASSWORD).
340+
// Note: Two K8s publisher Helm value bugs require workarounds:
341+
// 1. secrets.cache.cache_password: The Helm template expression uses the parameter name
342+
// (cache_password from "cache-password") but values.yaml uses the env var key (REDIS_PASSWORD).
343+
// We must set the parameter name path for the password to reach the K8s Secret.
344+
// 2. secrets.webfrontend.cache_password: Cross-resource secret references create Helm value
345+
// paths under the consuming resource instead of the owning resource (issue #14370).
337346
output.WriteLine("Step 21: Deploying Helm chart to AKS...");
338347
sequenceBuilder
339348
.Type($"helm install aksredis ../charts --namespace default --wait --timeout 10m " +
340349
$"--set parameters.webfrontend.webfrontend_image={acrName}.azurecr.io/webfrontend:latest " +
341350
$"--set parameters.apiservice.apiservice_image={acrName}.azurecr.io/apiservice:latest " +
342-
$"--set secrets.webfrontend.cache_password=\"\"")
351+
$"--set secrets.cache.cache_password={redisPassword} " +
352+
$"--set secrets.webfrontend.cache_password={redisPassword}")
343353
.Enter()
344354
.WaitForSuccessPrompt(counter, TimeSpan.FromMinutes(12));
345355

346-
// Step 22: Wait for all pods to be ready (including Redis)
347-
output.WriteLine("Step 22: Waiting for pods to be ready...");
356+
// Step 22: Wait for all pods to be ready (including Redis cache)
357+
output.WriteLine("Step 22: Waiting for all pods to be ready...");
348358
sequenceBuilder
349-
.Type("kubectl wait --for=condition=ready pod --all -n default --timeout=120s")
359+
.Type("kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=apiservice --timeout=120s -n default && " +
360+
"kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=webfrontend --timeout=120s -n default && " +
361+
"kubectl wait --for=condition=ready pod -l app.kubernetes.io/component=cache --timeout=120s -n default")
350362
.Enter()
351363
.WaitForSuccessPrompt(counter, TimeSpan.FromMinutes(3));
352364

365+
// Step 22b: Verify Redis container is running and stable (no restarts)
366+
output.WriteLine("Step 22b: Verifying Redis container is stable...");
367+
sequenceBuilder
368+
.Type("kubectl get pod cache-statefulset-0 -o jsonpath='{.status.containerStatuses[0].ready} restarts:{.status.containerStatuses[0].restartCount}'")
369+
.Enter()
370+
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(30));
371+
353372
// Step 23: Verify all pods are running
354373
output.WriteLine("Step 23: Verifying pods are running...");
355374
sequenceBuilder
@@ -392,29 +411,23 @@ private async Task DeployStarterTemplateWithRedisToAksCore(CancellationToken can
392411
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
393412

394413
// Step 28: Verify webfrontend /weather page (exercises webfrontend → apiservice → Redis pipeline)
395-
// The /weather page is server-side rendered and fetches data from the apiservice.
396-
// Redis output caching is used, so this validates the full Redis integration.
414+
// The /weather page uses Blazor SSR streaming rendering which keeps the HTTP connection open.
415+
// We use -m 5 (max-time) to avoid curl hanging, and capture the status code in a variable
416+
// because --max-time causes curl to exit non-zero (code 28) even on HTTP 200.
397417
output.WriteLine("Step 28: Verifying webfrontend /weather page (exercises Redis cache)...");
398418
sequenceBuilder
399-
.Type("for i in $(seq 1 10); do sleep 3 && curl -sf http://localhost:18081/weather -o /dev/null -w '%{http_code}' && echo ' OK' && break; done")
419+
.Type("for i in $(seq 1 10); do sleep 3; S=$(curl -so /dev/null -w '%{http_code}' -m 5 http://localhost:18081/weather); [ \"$S\" = \"200\" ] && echo \"$S OK\" && break; done")
400420
.Enter()
401-
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(60));
402-
403-
// Step 29: Verify /weather page actually returns weather data
404-
output.WriteLine("Step 29: Verifying weather page content...");
405-
sequenceBuilder
406-
.Type("curl -sf http://localhost:18081/weather | grep -q 'Weather' && echo 'Weather page content verified'")
407-
.Enter()
408-
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(30));
421+
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(120));
409422

410-
// Step 30: Clean up port-forwards
411-
output.WriteLine("Step 30: Cleaning up port-forwards...");
423+
// Step 29: Clean up port-forwards
424+
output.WriteLine("Step 29: Cleaning up port-forwards...");
412425
sequenceBuilder
413426
.Type("kill %1 %2 2>/dev/null; true")
414427
.Enter()
415428
.WaitForSuccessPrompt(counter, TimeSpan.FromSeconds(10));
416429

417-
// Step 31: Exit terminal
430+
// Step 30: Exit terminal
418431
sequenceBuilder
419432
.Type("exit")
420433
.Enter();

0 commit comments

Comments
 (0)