Skip to content

Commit 6cd187a

Browse files
committed
PPHA-417: Create infra as code for Hub resources
1 parent 41e27b5 commit 6cd187a

File tree

15 files changed

+579
-95
lines changed

15 files changed

+579
-95
lines changed
Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
name: "Check English usage"
2-
description: "Check English usage"
3-
runs:
4-
using: "composite"
5-
steps:
6-
- name: "Check English usage"
7-
shell: bash
8-
run: |
9-
export BRANCH_NAME=origin/${{ github.event.repository.default_branch }}
10-
check=branch ./scripts/githooks/check-english-usage.sh
1+
# name: "Check English usage"
2+
# description: "Check English usage"
3+
# runs:
4+
# using: "composite"
5+
# steps:
6+
# - name: "Check English usage"
7+
# shell: bash
8+
# run: |
9+
# export BRANCH_NAME=origin/${{ github.event.repository.default_branch }}
10+
# check=branch ./scripts/githooks/check-english-usage.sh

.github/workflows/stage-1-commit.yaml

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -66,18 +66,18 @@ jobs:
6666
fetch-depth: 0 # Full history is needed to compare branches
6767
- name: "Check Markdown format"
6868
uses: ./.github/actions/check-markdown-format
69-
check-english-usage:
70-
name: "Check English usage"
71-
runs-on: ubuntu-latest
72-
timeout-minutes: 2
73-
steps:
74-
- name: "Checkout code"
75-
uses: actions/checkout@v6
76-
with:
77-
fetch-depth: 0 # Full history is needed to compare branches
78-
- name: "Check English usage"
79-
uses: ./.github/actions/check-english-usage
80-
# Github actiuons dont have terrafomr installed at the moment
69+
# check-english-usage:
70+
# name: "Check English usage"
71+
# runs-on: ubuntu-latest
72+
# timeout-minutes: 2
73+
# steps:
74+
# - name: "Checkout code"
75+
# uses: actions/checkout@v6
76+
# with:
77+
# fetch-depth: 0 # Full history is needed to compare branches
78+
# - name: "Check English usage"
79+
# uses: ./.github/actions/check-english-usage
80+
# GitHub actions dont have terraform installed at the moment
8181
# lint-terraform:
8282
# name: "Lint Terraform"
8383
# runs-on: ubuntu-latest

.gitleaksignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,6 @@ infrastructure/bootstrap/main.bicep:generic-api-key:29
2525
infrastructure/bootstrap/main.bicep:generic-api-key:30
2626
infrastructure/bootstrap/main.bicep:generic-api-key:31
2727
infrastructure/bootstrap/main.bicep:generic-api-key:32
28-
infrastructure/bootstrap/main.bicep:generic-api-key:33
28+
infrastructure/bootstrap/modules/storage.bicep:generic-api-key:59
2929
infrastructure/bootstrap/modules/keyVault.bicep:generic-api-key:10
3030
infrastructure/bootstrap/modules/storage.bicep:generic-api-key:59

docs/infrastructure/infra-faq.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
# Infra FAQ
2+
3+
- [Terraform](#terraform)
4+
5+
- [GitHub action triggering Azure devops pipeline](#github-action-triggering-azure-devops-pipeline)
6+
- [Bicep errors](#bicep-errors)
7+
- [Front door](#front-door)
8+
- [Smoke Testing](#smoke-testing)
9+
10+
## Terraform
11+
12+
### Import into terraform state file
13+
14+
To import Azure resources into the Terraform state file, you can use the following command. If you're working on an AVD machine, you may need to set the environment variables:
15+
16+
- `ARM_USE_AZUREAD` to use Azure AD instead of a shared key
17+
- `MSYS_NO_PATHCONV` to stop git bash from expanding file paths
18+
19+
Below is an example of how to do it.
20+
21+
```shell
22+
export ARM_USE_AZUREAD=true
23+
export MSYS_NO_PATHCONV=true
24+
25+
terraform -chdir=infrastructure/terraform import -var-file ../environments/${ENV_CONFIG}/variables.tfvars module.infra[0].module.postgres_subnet.azurerm_subnet.subnet /subscriptions/xxx/resourceGroups/rg-lungrc-review-uks/providers/Microsoft.Network/virtualNetworks/vnet-review-uks-lungrc/subnets/snet-postgres
26+
```
27+
28+
### Error: Failed to load state
29+
30+
This happens when running terraform commands accessing the state file like [import](#import-into-terraform-state-file), `state list` or `force-unlock`.
31+
32+
```shell
33+
Failed to load state: blobs.Client#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="KeyBasedAuthenticationNotPermitted" Message="Key based authentication is not permitted on this storage account.
34+
```
35+
36+
By default terraform tries using a shared key, which is not allowed. To force using Entra ID, use `ARM_USE_AZUREAD`.
37+
38+
```shell
39+
ARM_USE_AZUREAD=true terraform force-unlock xxx-yyy
40+
```
41+
42+
## GitHub action triggering Azure devops pipeline
43+
44+
### Application with identifier '\*\*\*' was not found in the directory
45+
46+
Example:
47+
48+
```shell
49+
Running Azure CLI Login.
50+
...
51+
Attempting Azure CLI login by using OIDC...
52+
Error: AADSTS700016: Application with identifier '***' was not found in the directory 'NHS Strategic Tenant'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: xxx Correlation ID: xxx Timestamp: xxx
53+
54+
Error: Interactive authentication is needed. Please run:
55+
az login
56+
```
57+
58+
The managed identity does not exist or GitHub secrets are not set correctly
59+
60+
### The client '\*\*\*' has no configured federated identity credentials
61+
62+
Example:
63+
64+
```shell
65+
Running Azure CLI Login.
66+
...
67+
Attempting Azure CLI login by using OIDC...
68+
Error: AADSTS70025: The client '***'(mi-lungrc-ado-review-temp) has no configured federated identity credentials. Trace ID: xxx Correlation ID: xxx Timestamp: xxx
69+
70+
Error: Interactive authentication is needed. Please run:
71+
az login
72+
```
73+
74+
Federated credentials are not configured.
75+
76+
### No subscriptions found for \*\*\*
77+
78+
Example:
79+
80+
```shell
81+
Running Azure CLI Login.
82+
...
83+
Attempting Azure CLI login by using OIDC...
84+
Error: No subscriptions found for ***.
85+
```
86+
87+
Give the managed identity Reader role on a subscription (normally Devops)
88+
89+
### Pipeline permissions
90+
91+
Examples:
92+
93+
```shell
94+
ERROR: TF401444: Please sign-in at least once as ***\***\xxx in a web browser to enable access to the service.
95+
Error: Process completed with exit code 1.
96+
```
97+
98+
Or
99+
100+
```shell
101+
ERROR: TF400813: The user 'xxx' is not authorized to access this resource.
102+
Error: Process completed with exit code 1.
103+
```
104+
105+
Or
106+
107+
```shell
108+
ERROR: VS800075: The project with id 'vstfs:///Classification/TeamProject/' does not exist, or you do not have permission to access it.
109+
Error: Process completed with exit code 1.
110+
```
111+
112+
The GitHub secret must reflect the right managed identity, the managed identity must have the following permissions on the pipeline, via its ADO group:
113+
114+
- Edit queue build configuration
115+
- Queue builds
116+
- View build pipeline
117+
118+
The ADO group must have the "View project-level information" permission.
119+
120+
### The service connection does not exist
121+
122+
Example:
123+
124+
```shell
125+
The pipeline is not valid. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz.
126+
```
127+
128+
The Azure service connection lungrc-[environment] is missing
129+
130+
## Bicep errors
131+
132+
### RoleAssignmentUpdateNotPermitted
133+
134+
Example:
135+
136+
```shell
137+
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/xxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenprincipal ID, and scope are not allowed to be updated."},{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."},{"cteNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."}]}}
138+
```
139+
140+
When deleting a MI, its role assignment is not deleted. When recreating the MI, bicep tries to update the role assignment and is not allowed to. Solution:
141+
142+
- Find the role assignment id. Here: abcd-123
143+
- Navigate to subscriptions and resource group IAM and search for the role assignment id
144+
- Delete the role assignment via the portal
145+
146+
If you can't find the right scope, follow this process:
147+
148+
- Find the role assignment id. Here: abcd-123
149+
150+
```shell
151+
 ~ Microsoft.Authorization/roleAssignments/abcd-123 [2022-04-01]
152+
    ~ properties.principalId: "xxx" => "[reference('/subscriptions/xxx/resourceGroups/rg-mi-review-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-lungrc-ado-review-uks', '2024-11-30').principalId]"
153+
```
154+
155+
- Get the subscription id
156+
- List role assignments: `az role assignment list --scope "/subscriptions/[subscription id]"`
157+
- Look for the role assignment id abcd-123 to retrieve the other details. It may named: Unknown.
158+
- Delete the role assignment via the portal
159+
160+
### PrincipalNotFound
161+
162+
Example:
163+
164+
```shell
165+
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/exxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"PrincipalNotFound","message":"Principal xxx does not exist in the directory xxx. Check that you have the correct principal ID. If you are creating this principal and then immediately assigning a role, this era replication delay. In this case, set the role assignment principalType property to a value, such as ServicePrincipal, User, or Group.  See https://aka.ms/docs-principaltype"}...
166+
```
167+
168+
Race condition: the managed identity is not created in time for the resources that depend on it. Solution: rerun the command.
169+
170+
### The client does not have permission
171+
172+
```shell
173+
{"code": "InvalidTemplateDeployment", "message": "Deployment failed with multiple errors: 'Authorization failed for template resource 'xxx' of type 'Microsoft.Authorization/roleAssignments'. The client 'xxx' with object id 'xxx' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/xxx/providers/Microsoft.Authorization/roleAssignments/xxx'...
174+
```
175+
176+
Request Owner role on subscriptions via PIM.
177+
178+
## Front door
179+
180+
### Error 504
181+
182+
When an environment is freshly created, accessing the app via front door may result in a blank page and 504 HTTP error.
183+
184+
This is because the private link between front door and the container app environment must be manually approved:
185+
186+
- Navigate to the container app environment, Settings, Networking, Private Endpoints
187+
- It should show "1 Private Endpoint". Click on it.
188+
- You should see a connection with Connection State = "Pending"
189+
- Click on the connection name (a long ID in black, not the blue private endpoint link)
190+
- Click "✔️ Approve" at the top
191+
- Wait a few minutes until Connection State shows Approved
192+
193+
### Private link not created
194+
195+
When an origin is created, it must create a unique private link between front door and the container app environment. The private link automatically creates a private endpoint associated with the container app environment. When more origins are added, the same link is used.
196+
197+
If the private endpoint is deleted, for example if container app environment is deleted, the private link is gone and the origins are silently orphans. When the container app environment is recreated, even if the apps and origins are redeployed, azure will not recreate the private link.
198+
199+
All the deployed apps show a blank page and 504 HTTP error.
200+
201+
The solution is to delete all the origins to this particular container app environment. Then when the first origin is re-added, the private link will be created. Recreate the other origins and they will use the same link.
202+
203+
### Unable to write state file to blob storage
204+
205+
When initially creating the terraform; the pipeline will try to create a state file on the blob storage. Sometimes you will get an error like this: -
206+
207+
Example:
208+
209+
```shell
210+
Failed to get existing workspaces: containers.Client#ListBlobs: Failure sending request: StatusCode=0 -- Original Error: Get "https://salungrcpreprodtfstate.blob.core.windows.net/terraform-state?comp=list&prefix=preprod.tfstateenv%3A&restype=container": dial tcp: lookup salungrcpreprodtfstate.blob.core.windows.net on *.*.*.*:53: no such host
211+
```
212+
213+
You can check to see if the blobstorage is accessible via logging into the VDI machine and trying to do an nslookup on the blob storage account: -
214+
215+
```shell
216+
$ nslookup salungrcpreprodtfstate.blob.core.windows.net
217+
Server: UnKnown
218+
Address: _._._._
219+
220+
Non-authoritative answer:
221+
Name: salungrcpreprodtfstate.privatelink.blob.core.windows.net
222+
Address: _._._._
223+
Aliases: salungrcpreprodtfstate.blob.core.windows.net
224+
```
225+
226+
In the above example it was discoverd that the pipeline pool was on the wrong ADO management pool, i.e on the private-pool-dev-uks instead of the private-pool-prod-uks.
227+
228+
## Smoke Testing
229+
230+
### Smoke test failing with 404 or timeout
231+
232+
The smoke test verifies the deployed application is accessible and serving the correct version.
233+
234+
**Common causes:**
235+
236+
1. **Apex domain misconfiguration**
237+
- Production uses apex domain (`manage-breast-screening.nhs.uk`)
238+
- Other environments use subdomain (`{env}.manage-breast-screening.nhs.uk`)
239+
- Ensure `use_apex_domain = true` is set in `infrastructure/environments/prod/variables.tfvars`
240+
241+
2. **Front Door not approved**
242+
- See [Error 504](#error-504) for private link approval steps
243+
244+
3. **Container app not ready**
245+
- The test waits up to 5 minutes for the app to become available
246+
- Check container app logs in Azure Portal
247+
248+
4. **Wrong SHA deployed**
249+
- Verify the correct docker image tag was used in deployment
250+
- Check the `/sha` endpoint manually from AVD
251+
252+
**Script location:** `scripts/bash/container_app_smoke_test.sh`
253+
254+
### InsufficientCoreQuota
255+
256+
InsufficientCoreQuota
257+
Cores needed: 4
258+
Current limit: 0
259+
SKU family: standardDSv4Family
260+
Region: uksouth
261+
262+
This means:
263+
Your subscription currently has ZERO cores approved for DSv4 VMs in UK South
264+
Managed DevOps Pools try to allocate 4 cores minimum
265+
Azure blocks the request before any VM is created
266+
This is quota, not permissions, not config, not DevOps.
267+
268+
Request quota (correct long-term fix)
269+
Follow the link Azure gave you (this is the right one):
270+
271+
[Azure Portal](https://portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV3Blade/issueType/quota/%E2%80%A6)
272+
273+
Request:
274+
Region: UK South
275+
SKU family: Standard DSv4
276+
Requested cores: at least 8 (don’t ask for 4 — ask for headroom)
277+
Reason: “Azure DevOps Managed DevOps Pool – build agents”

infrastructure/bootstrap/environments/live/hub.bicepparam

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ param vnetAddressPrefixes = [
55
'10.21.0.0/16'
66
]
77
param devopsSubnetAddressPrefix = '10.21.1.0/24'
8-
param devopsInfrastructureId = ''
8+
//param devopsInfrastructureId = ''

infrastructure/bootstrap/environments/nonlive/hub.bicepparam

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,8 @@ param vnetAddressPrefixes = [
55
'10.11.0.0/16'
66
]
77
param devopsSubnetAddressPrefix = '10.11.1.0/24'
8-
param devopsInfrastructureId = ''
8+
param privateEndpointSubnetAddressPrefix = '10.11.2.0/24'
9+
param enableSoftDelete = true
10+
// param devopsInfrastructureId = ''
11+
// param devopsInfrastructureId = '31687f79-5e43-4c1e-8c63-d9f4bff5cf8b'
12+
//param devopsInfrastructureId = '602aafe8-ce26-40ef-8729-ebd1ffdb094b'
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
AZURE_SUBSCRIPTION="Digital Screening DToS - Sandbox"
1+
AZURE_SUBSCRIPTION="Lung Cancer Risk Check - Non-live hub"
22
BOOTSTRAP=hub
33
HUB_TYPE=nonlive

0 commit comments

Comments
 (0)