feat (in-cluster): [webapprouting] replace traefik with nginx addon#446
feat (in-cluster): [webapprouting] replace traefik with nginx addon#446ferantivero wants to merge 27 commits intomspnp:mainfrom
Conversation
highights: - configure the application routing add-on to automatically create records on the Ingress private DNS zones
…ancer highlights: - configure nginx with an ILB - update the managed cluster api version
…ing AKS addon (nginx)
… load balancer should be bound to
…routing-system managed identity
There was a problem hiding this comment.
Pull Request Overview
This PR replaces the manually deployed Traefik ingress controller with the AKS-managed NGINX Web App Routing addon to streamline the deployment process. The change enables a built-in AKS feature for ingress management with integrated Azure Key Vault and Private DNS Zone support.
Key Changes:
- Enabled the built-in Web App Routing addon with NGINX ingress controller configured for internal load balancing
- Integrated managed identity for Key Vault certificate access and Private DNS Zone record management
- Removed all Traefik-related resources and dependencies (deployment manifests, CSI provider configuration, workload identity setup)
Reviewed Changes
Copilot reviewed 18 out of 20 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| workload/traefik.yaml | Removed entire Traefik ingress controller manifest including ServiceAccount, RBAC, ConfigMap, Service, and Deployment |
| workload/02-aspnetapp-ingress.yaml | Updated ingress resource to use nginx-internal class with Key Vault certificate integration and NGINX-specific annotations |
| workload/01-aspnetapp.yaml | Modified pod security context, added health probes, updated affinity rules to target NGINX ingress controller, added NET_BIND_SERVICE capability |
| workload-team/cluster-stamp.bicep | Enabled webAppRouting profile, configured DNS zone integration, removed podmi-ingress-controller resources, added role assignments for web app routing managed identity, upgraded API version to 2025-07-02-preview |
| cluster-manifests/a0008/nginx-internal.yaml | Added new NginxIngressController custom resource defining internal load balancer configuration |
| cluster-manifests/a0008/ingress-network-policy.yaml | Updated network policy to allow traffic from app-routing-system namespace and nginx-internal ingress controller pods |
| workload-team/modules/policies.bicep | Removed Traefik-related policy violation comments and updated resource limit comments |
| docs/* | Updated documentation to reflect NGINX ingress controller, removed Traefik installation steps, updated navigation links |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| cat nginx-iternal-ingress-controller-tls.crt nginx-iternal-ingress-controller-tls.key > nginx-iternal-ingress-controller-tls.pem | ||
| export INGRESS_CONTROLLER_KV_CERT_URI=$(az keyvault certificate import -f nginx-iternal-ingress-controller-tls.pem -n nginx-iternal-ingress-controller-tls --vault-name $KEYVAULT_NAME_AKS_BASELINE --query id -o tsv) |
There was a problem hiding this comment.
Corrected spelling of 'iternal' to 'internal' in certificate filename.
| cat nginx-iternal-ingress-controller-tls.crt nginx-iternal-ingress-controller-tls.key > nginx-iternal-ingress-controller-tls.pem | |
| export INGRESS_CONTROLLER_KV_CERT_URI=$(az keyvault certificate import -f nginx-iternal-ingress-controller-tls.pem -n nginx-iternal-ingress-controller-tls --vault-name $KEYVAULT_NAME_AKS_BASELINE --query id -o tsv) | |
| cat nginx-internal-ingress-controller-tls.crt nginx-internal-ingress-controller-tls.key > nginx-internal-ingress-controller-tls.pem | |
| export INGRESS_CONTROLLER_KV_CERT_URI=$(az keyvault certificate import -f nginx-internal-ingress-controller-tls.pem -n nginx-internal-ingress-controller-tls --vault-name $KEYVAULT_NAME_AKS_BASELINE --query id -o tsv) |
docs/deploy/02-ca-certificates.md
Outdated
|
|
||
| ```bash | ||
| openssl req -x509 -nodes -days 365 -newkey rsa:2048 -out traefik-ingress-internal-aks-ingress-tls.crt -keyout traefik-ingress-internal-aks-ingress-tls.key -subj "/CN=*.aks-ingress.${DOMAIN_NAME_AKS_BASELINE}/O=Contoso AKS Ingress" | ||
| openssl req -x509 -nodes -days 365 -newkey rsa:2048 -out nginx-iternal-ingress-controller-tls.crt -keyout nginx-iternal-ingress-controller-tls.key -subj "/CN=*.aks-ingress.${DOMAIN_NAME_AKS_BASELINE}/O=Contoso AKS Ingress" |
| az role assignment delete --ids $TEMP_ROLEASSIGNMENT_TO_UPLOAD_CERT | ||
| ``` | ||
|
|
||
| ## Check internal NGINX ingress controller is up and runnning |
There was a problem hiding this comment.
Corrected spelling of 'runnning' to 'running'.
| ## Check internal NGINX ingress controller is up and runnning | |
| ## Check internal NGINX ingress controller is up and running |
workload/01-aspnetapp.yaml
Outdated
| securityContext: | ||
| runAsUser: 10001 | ||
| runAsGroup: 3000 | ||
| securityContext: {} |
There was a problem hiding this comment.
The pod-level securityContext is now empty but container-level security settings (runAsNonRoot, runAsUser, runAsGroup) are defined at lines 77-79. Consider whether pod-level fsGroup setting should be retained for consistent group ownership of volumes, or document why it was intentionally removed.
| securityContext: {} | |
| securityContext: | |
| fsGroup: 3000 |
workload-team/cluster-stamp.bicep
Outdated
| } | ||
|
|
||
| // Built-in Azure RBAC role that is applied a Key Vault to grant with metadata, certificates, keys and secrets read privileges. Granted to App Gateway's managed identity. | ||
| // Built-in Azure RBAC role that is applied a Key Vault to grant with metadata, certificates, keys and secrets read privileges. Granted to App Gateway's managed identity and our web app routing profile's managed identiy. |
workload-team/cluster-stamp.bicep
Outdated
| } | ||
|
|
||
| // Built-in Azure RBAC role that is applied to a Key Vault to grant with secrets content read privileges. Granted to both Key Vault and our workload's identity. | ||
| // Built-in Azure RBAC role that is applied to a Key Vault to grant with secrets content read privileges. Granted to our web app routing profile's managed identiy. |
workload-team/cluster-stamp.bicep
Outdated
| scope: subscription() | ||
| } | ||
|
|
||
| // Built-in Azure RBAC role that is applied to a Private DNS Zone to grant with contributor privileges. Granted our web app routing profile's managed identiy. |
highlights: - CIS Benchmarks and most cluster-hardening guides recommend non-root users + fsGroup to ensure the principle of least privilege and writable volumes. - fsGroup is not redudant since the workload is not configured ad read-only. - shared value btw fsGroup and runAsGroup (primary process GID) is fine for simple cases. Process can read/write volume files without needing additional permission adjustments. The process’s own group is enough to access its volumes. In other words, same group ID governs both “who I am” and “what I can write.”
Co-authored-by: John Downs <john@johndowns.co.nz>
WHY
we wanted to experiment with the NGINX addon (web app routing) that could replace the existing ingress controller in the AKS Baseline Reference Implementation (Traefik). This way we can remove one of the "manual" dependencies we took, streamline the deployment process and use built-in features in AKS.
WHAT Changed?
internaltypeTEST
tested e2e and we plan to test this once again after addressing feedback + upon approval