Fix all services failing in AKS: add missing secrets, fix image refs#530
Conversation
Root cause: nearly all services (0/N replicas) because the deploy workflow only created 3 K8s secrets but services reference 9 distinct secrets. Secrets added: - mongodb-secret (from Cosmos DB connection string, MongoDB API) - cosmos-db-secret (endpoint + key for Cosmos DB SDK services) - cosmos-config (same, used by trading-partner-service) - redis-secret (for claims-service, benefit-plan-service) - kafka-secret (for claims-scrubbing-service) - azure-storage-secret (for appeals, claims-scrubbing) - azure-ad-config: added missing Audience key Also fixed: - sed replacement now handles ghcr.io image refs (6 services) - trading-partner-service containerPort 80 → 8080 (matches health probes) Required GitHub secrets to add: COSMOS_DB_ENDPOINT, COSMOS_DB_KEY, REDIS_CONNECTION_STRING, KAFKA_SASL_USERNAME, KAFKA_SASL_PASSWORD, AZURE_STORAGE_CONNECTION_STRING, AZURE_AD_AUDIENCE https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS
There was a problem hiding this comment.
Pull request overview
This PR updates the AKS deployment workflow and a service manifest to prevent pods from staying at 0/N replicas by ensuring required Kubernetes Secrets exist, normalizing image references to the built ACR images, and aligning the trading-partner-service container port with its probes.
Changes:
- Add creation of several missing Kubernetes Secrets (MongoDB/Cosmos/Redis/Kafka/Azure Storage) and include Azure AD Audience in the existing Azure AD secret.
- Update
sedsubstitutions so service manifests usingghcr.io/...image refs are rewritten to${ACR}/...during deploy. - Fix
trading-partner-servicecontainer port to8080.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/services/trading-partner-service/k8s/trading-partner-service-deployment.yaml |
Updates container port to 8080 to match health probes/targetPort. |
.github/workflows/deploy-azure-aks.yml |
Creates additional Secrets required by services, adds Audience to Azure AD secret, and rewrites GHCR image refs to ACR during manifest apply. |
| - name: Create MongoDB secret (Cosmos DB MongoDB API) | ||
| run: | | ||
| kubectl create secret generic mongodb-secret \ | ||
| --from-literal=connectionString="${{ secrets.COSMOS_DB_CONNECTION_STRING }}" \ | ||
| -n ${{ env.NAMESPACE }} \ | ||
| --dry-run=client -o yaml | kubectl apply -f - |
There was a problem hiding this comment.
The workflow is interpolating GitHub secrets directly into the shell script (e.g., "${{ secrets.COSMOS_DB_CONNECTION_STRING }}"). If any secret contains characters like $( or backticks, bash will evaluate them during script parsing, and even benign characters can cause quoting/escaping issues. Prefer passing secrets via the step env: block and referencing them as normal shell variables when building kubectl create secret commands (this avoids re-parsing secret contents by the shell).
| - name: Create Cosmos DB endpoint/key secret | ||
| run: | | ||
| kubectl create secret generic cosmos-db-secret \ | ||
| --from-literal=endpoint="${{ secrets.COSMOS_DB_ENDPOINT }}" \ | ||
| --from-literal=key="${{ secrets.COSMOS_DB_KEY }}" \ | ||
| -n ${{ env.NAMESPACE }} \ | ||
| --dry-run=client -o yaml | kubectl apply -f - |
There was a problem hiding this comment.
These secret-creation steps will still succeed if the required GitHub secrets are missing (GitHub expressions become empty strings), resulting in Kubernetes Secrets with empty values and hard-to-diagnose runtime failures. Add an explicit validation/guard in the run block (or a dedicated step) to fail the job when required values like COSMOS_DB_ENDPOINT/COSMOS_DB_KEY are unset or empty before calling kubectl create secret.
Root cause: nearly all services (0/N replicas) because the deploy workflow only created 3 K8s secrets but services reference 9 distinct secrets.
Secrets added:
Also fixed:
Required GitHub secrets to add:
COSMOS_DB_ENDPOINT, COSMOS_DB_KEY, REDIS_CONNECTION_STRING, KAFKA_SASL_USERNAME, KAFKA_SASL_PASSWORD, AZURE_STORAGE_CONNECTION_STRING, AZURE_AD_AUDIENCE
https://claude.ai/code/session_01A95Uah18uxLJpuAR5HShNS