|
| 1 | +# Infra FAQ |
| 2 | + |
| 3 | +- [Terraform](#terraform) |
| 4 | + |
| 5 | +- [GitHub action triggering Azure devops pipeline](#github-action-triggering-azure-devops-pipeline) |
| 6 | +- [Bicep errors](#bicep-errors) |
| 7 | +- [Front door](#front-door) |
| 8 | +- [Smoke Testing](#smoke-testing) |
| 9 | + |
| 10 | +## Terraform |
| 11 | + |
| 12 | +### Import into terraform state file |
| 13 | + |
| 14 | +To import Azure resources into the Terraform state file, you can use the following command. If you're working on an AVD machine, you may need to set the environment variables: |
| 15 | + |
| 16 | +- `ARM_USE_AZUREAD` to use Azure AD instead of a shared key |
| 17 | +- `MSYS_NO_PATHCONV` to stop git bash from expanding file paths |
| 18 | + |
| 19 | +Below is an example of how to do it. |
| 20 | + |
| 21 | +```shell |
| 22 | +export ARM_USE_AZUREAD=true |
| 23 | +export MSYS_NO_PATHCONV=true |
| 24 | + |
| 25 | +terraform -chdir=infrastructure/terraform import -var-file ../environments/${ENV_CONFIG}/variables.tfvars module.infra[0].module.postgres_subnet.azurerm_subnet.subnet /subscriptions/xxx/resourceGroups/rg-lungrc-review-uks/providers/Microsoft.Network/virtualNetworks/vnet-review-uks-lungrc/subnets/snet-postgres |
| 26 | +``` |
| 27 | + |
| 28 | +### Error: Failed to load state |
| 29 | + |
| 30 | +This happens when running terraform commands accessing the state file like [import](#import-into-terraform-state-file), `state list` or `force-unlock`. |
| 31 | + |
| 32 | +```shell |
| 33 | +Failed to load state: blobs.Client#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="KeyBasedAuthenticationNotPermitted" Message="Key based authentication is not permitted on this storage account. |
| 34 | +``` |
| 35 | +
|
| 36 | +By default terraform tries using a shared key, which is not allowed. To force using Entra ID, use `ARM_USE_AZUREAD`. |
| 37 | +
|
| 38 | +```shell |
| 39 | +ARM_USE_AZUREAD=true terraform force-unlock xxx-yyy |
| 40 | +``` |
| 41 | +
|
| 42 | +## GitHub action triggering Azure devops pipeline |
| 43 | +
|
| 44 | +### Application with identifier '\*\*\*' was not found in the directory |
| 45 | +
|
| 46 | +Example: |
| 47 | +
|
| 48 | +```shell |
| 49 | +Running Azure CLI Login. |
| 50 | +... |
| 51 | +Attempting Azure CLI login by using OIDC... |
| 52 | +Error: AADSTS700016: Application with identifier '***' was not found in the directory 'NHS Strategic Tenant'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: xxx Correlation ID: xxx Timestamp: xxx |
| 53 | +
|
| 54 | +Error: Interactive authentication is needed. Please run: |
| 55 | +az login |
| 56 | +``` |
| 57 | +
|
| 58 | +The managed identity does not exist or GitHub secrets are not set correctly |
| 59 | +
|
| 60 | +### The client '\*\*\*' has no configured federated identity credentials |
| 61 | +
|
| 62 | +Example: |
| 63 | +
|
| 64 | +```shell |
| 65 | +Running Azure CLI Login. |
| 66 | +... |
| 67 | +Attempting Azure CLI login by using OIDC... |
| 68 | +Error: AADSTS70025: The client '***'(mi-lungrc-ado-review-temp) has no configured federated identity credentials. Trace ID: xxx Correlation ID: xxx Timestamp: xxx |
| 69 | +
|
| 70 | +Error: Interactive authentication is needed. Please run: |
| 71 | +az login |
| 72 | +``` |
| 73 | +
|
| 74 | +Federated credentials are not configured. |
| 75 | +
|
| 76 | +### No subscriptions found for \*\*\* |
| 77 | +
|
| 78 | +Example: |
| 79 | +
|
| 80 | +```shell |
| 81 | +Running Azure CLI Login. |
| 82 | +... |
| 83 | +Attempting Azure CLI login by using OIDC... |
| 84 | +Error: No subscriptions found for ***. |
| 85 | +``` |
| 86 | +
|
| 87 | +Give the managed identity Reader role on a subscription (normally Devops) |
| 88 | +
|
| 89 | +### Pipeline permissions |
| 90 | +
|
| 91 | +Examples: |
| 92 | +
|
| 93 | +```shell |
| 94 | +ERROR: TF401444: Please sign-in at least once as ***\***\xxx in a web browser to enable access to the service. |
| 95 | +Error: Process completed with exit code 1. |
| 96 | +``` |
| 97 | +
|
| 98 | +Or |
| 99 | +
|
| 100 | +```shell |
| 101 | +ERROR: TF400813: The user 'xxx' is not authorized to access this resource. |
| 102 | +Error: Process completed with exit code 1. |
| 103 | +``` |
| 104 | +
|
| 105 | +Or |
| 106 | +
|
| 107 | +```shell |
| 108 | +ERROR: VS800075: The project with id 'vstfs:///Classification/TeamProject/' does not exist, or you do not have permission to access it. |
| 109 | +Error: Process completed with exit code 1. |
| 110 | +``` |
| 111 | +
|
| 112 | +The GitHub secret must reflect the right managed identity, the managed identity must have the following permissions on the pipeline, via its ADO group: |
| 113 | +
|
| 114 | +- Edit queue build configuration |
| 115 | +- Queue builds |
| 116 | +- View build pipeline |
| 117 | +
|
| 118 | +The ADO group must have the "View project-level information" permission. |
| 119 | +
|
| 120 | +### The service connection does not exist |
| 121 | +
|
| 122 | +Example: |
| 123 | +
|
| 124 | +```shell |
| 125 | +The pipeline is not valid. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. |
| 126 | +``` |
| 127 | +
|
| 128 | +The Azure service connection lungrc-[environment] is missing |
| 129 | +
|
| 130 | +## Bicep errors |
| 131 | +
|
| 132 | +### RoleAssignmentUpdateNotPermitted |
| 133 | +
|
| 134 | +Example: |
| 135 | +
|
| 136 | +```shell |
| 137 | +ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/xxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenprincipal ID, and scope are not allowed to be updated."},{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."},{"cteNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."}]}} |
| 138 | +``` |
| 139 | +
|
| 140 | +When deleting a MI, its role assignment is not deleted. When recreating the MI, bicep tries to update the role assignment and is not allowed to. Solution: |
| 141 | +
|
| 142 | +- Find the role assignment id. Here: abcd-123 |
| 143 | +- Navigate to subscriptions and resource group IAM and search for the role assignment id |
| 144 | +- Delete the role assignment via the portal |
| 145 | +
|
| 146 | +If you can't find the right scope, follow this process: |
| 147 | +
|
| 148 | +- Find the role assignment id. Here: abcd-123 |
| 149 | +
|
| 150 | +```shell |
| 151 | + ~ Microsoft.Authorization/roleAssignments/abcd-123 [2022-04-01] |
| 152 | + ~ properties.principalId: "xxx" => "[reference('/subscriptions/xxx/resourceGroups/rg-mi-review-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-lungrc-ado-review-uks', '2024-11-30').principalId]" |
| 153 | +``` |
| 154 | +
|
| 155 | +- Get the subscription id |
| 156 | +- List role assignments: `az role assignment list --scope "/subscriptions/[subscription id]"` |
| 157 | +- Look for the role assignment id abcd-123 to retrieve the other details. It may named: Unknown. |
| 158 | +- Delete the role assignment via the portal |
| 159 | +
|
| 160 | +### PrincipalNotFound |
| 161 | +
|
| 162 | +Example: |
| 163 | +
|
| 164 | +```shell |
| 165 | +ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/exxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"PrincipalNotFound","message":"Principal xxx does not exist in the directory xxx. Check that you have the correct principal ID. If you are creating this principal and then immediately assigning a role, this era replication delay. In this case, set the role assignment principalType property to a value, such as ServicePrincipal, User, or Group. See https://aka.ms/docs-principaltype"}... |
| 166 | +``` |
| 167 | +
|
| 168 | +Race condition: the managed identity is not created in time for the resources that depend on it. Solution: rerun the command. |
| 169 | +
|
| 170 | +### The client does not have permission |
| 171 | +
|
| 172 | +```shell |
| 173 | +{"code": "InvalidTemplateDeployment", "message": "Deployment failed with multiple errors: 'Authorization failed for template resource 'xxx' of type 'Microsoft.Authorization/roleAssignments'. The client 'xxx' with object id 'xxx' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/xxx/providers/Microsoft.Authorization/roleAssignments/xxx'... |
| 174 | +``` |
| 175 | +
|
| 176 | +Request Owner role on subscriptions via PIM. |
| 177 | +
|
| 178 | +## Front door |
| 179 | +
|
| 180 | +### Error 504 |
| 181 | +
|
| 182 | +When an environment is freshly created, accessing the app via front door may result in a blank page and 504 HTTP error. |
| 183 | +
|
| 184 | +This is because the private link between front door and the container app environment must be manually approved: |
| 185 | +
|
| 186 | +- Navigate to the container app environment, Settings, Networking, Private Endpoints |
| 187 | +- It should show "1 Private Endpoint". Click on it. |
| 188 | +- You should see a connection with Connection State = "Pending" |
| 189 | +- Click on the connection name (a long ID in black, not the blue private endpoint link) |
| 190 | +- Click "✔️ Approve" at the top |
| 191 | +- Wait a few minutes until Connection State shows Approved |
| 192 | +
|
| 193 | +### Private link not created |
| 194 | +
|
| 195 | +When an origin is created, it must create a unique private link between front door and the container app environment. The private link automatically creates a private endpoint associated with the container app environment. When more origins are added, the same link is used. |
| 196 | +
|
| 197 | +If the private endpoint is deleted, for example if container app environment is deleted, the private link is gone and the origins are silently orphans. When the container app environment is recreated, even if the apps and origins are redeployed, azure will not recreate the private link. |
| 198 | +
|
| 199 | +All the deployed apps show a blank page and 504 HTTP error. |
| 200 | +
|
| 201 | +The solution is to delete all the origins to this particular container app environment. Then when the first origin is re-added, the private link will be created. Recreate the other origins and they will use the same link. |
| 202 | +
|
| 203 | +### Unable to write state file to blob storage |
| 204 | +
|
| 205 | +When initially creating the terraform; the pipeline will try to create a state file on the blob storage. Sometimes you will get an error like this: - |
| 206 | +
|
| 207 | +Example: |
| 208 | +
|
| 209 | +```shell |
| 210 | +Failed to get existing workspaces: containers.Client#ListBlobs: Failure sending request: StatusCode=0 -- Original Error: Get "https://salungrcpreprodtfstate.blob.core.windows.net/terraform-state?comp=list&prefix=preprod.tfstateenv%3A&restype=container": dial tcp: lookup salungrcpreprodtfstate.blob.core.windows.net on *.*.*.*:53: no such host |
| 211 | +``` |
| 212 | +
|
| 213 | +You can check to see if the blobstorage is accessible via logging into the VDI machine and trying to do an nslookup on the blob storage account: - |
| 214 | +
|
| 215 | +```shell |
| 216 | +$ nslookup salungrcpreprodtfstate.blob.core.windows.net |
| 217 | +Server: UnKnown |
| 218 | +Address: _._._._ |
| 219 | +
|
| 220 | +Non-authoritative answer: |
| 221 | +Name: salungrcpreprodtfstate.privatelink.blob.core.windows.net |
| 222 | +Address: _._._._ |
| 223 | +Aliases: salungrcpreprodtfstate.blob.core.windows.net |
| 224 | +``` |
| 225 | +
|
| 226 | +In the above example it was discoverd that the pipeline pool was on the wrong ADO management pool, i.e on the private-pool-dev-uks instead of the private-pool-prod-uks. |
| 227 | +
|
| 228 | +## Smoke Testing |
| 229 | +
|
| 230 | +### Smoke test failing with 404 or timeout |
| 231 | +
|
| 232 | +The smoke test verifies the deployed application is accessible and serving the correct version. |
| 233 | +
|
| 234 | +**Common causes:** |
| 235 | +
|
| 236 | +1. **Apex domain misconfiguration** |
| 237 | + - Production uses apex domain (`manage-breast-screening.nhs.uk`) |
| 238 | + - Other environments use subdomain (`{env}.manage-breast-screening.nhs.uk`) |
| 239 | + - Ensure `use_apex_domain = true` is set in `infrastructure/environments/prod/variables.tfvars` |
| 240 | +
|
| 241 | +2. **Front Door not approved** |
| 242 | + - See [Error 504](#error-504) for private link approval steps |
| 243 | +
|
| 244 | +3. **Container app not ready** |
| 245 | + - The test waits up to 5 minutes for the app to become available |
| 246 | + - Check container app logs in Azure Portal |
| 247 | +
|
| 248 | +4. **Wrong SHA deployed** |
| 249 | + - Verify the correct docker image tag was used in deployment |
| 250 | + - Check the `/sha` endpoint manually from AVD |
| 251 | +
|
| 252 | +**Script location:** `scripts/bash/container_app_smoke_test.sh` |
| 253 | +
|
| 254 | +### InsufficientCoreQuota |
| 255 | +
|
| 256 | +InsufficientCoreQuota |
| 257 | +Cores needed: 4 |
| 258 | +Current limit: 0 |
| 259 | +SKU family: standardDSv4Family |
| 260 | +Region: uksouth |
| 261 | +
|
| 262 | +This means: |
| 263 | +Your subscription currently has ZERO cores approved for DSv4 VMs in UK South |
| 264 | +Managed DevOps Pools try to allocate 4 cores minimum |
| 265 | +Azure blocks the request before any VM is created |
| 266 | +This is quota, not permissions, not config, not DevOps. |
| 267 | +
|
| 268 | +Request quota (correct long-term fix) |
| 269 | +Follow the link Azure gave you (this is the right one): |
| 270 | +
|
| 271 | +[Azure Portal](https://portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV3Blade/issueType/quota/%E2%80%A6) |
| 272 | +
|
| 273 | +Request: |
| 274 | + Region: UK South |
| 275 | + SKU family: Standard DSv4 |
| 276 | + Requested cores: at least 8 (don’t ask for 4 — ask for headroom) |
| 277 | + Reason: “Azure DevOps Managed DevOps Pool – build agents” |
0 commit comments