Skip to content

Commit 0af2db4

Browse files
committed
Document deployment
1 parent e9e9582 commit 0af2db4

File tree

4 files changed

+251
-47
lines changed

4 files changed

+251
-47
lines changed

README.md

Lines changed: 2 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -110,52 +110,9 @@ To generate a new app, run:
110110
poetry run ./manage.py startapp <app_name> manage_breast_screening/`
111111
```
112112

113-
## Manual Deployment
113+
## Deployment
114114

115-
The build pipeline builds and pushes a docker image to [Github container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). The app is deployed to an [Azure container app](https://azure.microsoft.com/en-us/products/container-apps) using terraform.
116-
117-
For each environment, e.g. 'dev':
118-
119-
1. Connect to [Azure virtual desktop](https://azure.microsoft.com/en-us/products/virtual-desktop). Ask the platform team for access with Administrator role.
120-
1. If not present, install the following software: terraform (version 1.7.0), git, make, jq.
121-
- Run a Command prompt as administrator
122-
- choco install terraform --version 1.7.0
123-
- choco install terraform git make jq
124-
1. Open git bash
125-
1. Clone the repository: `git clone https://github.com/NHSDigital/dtos-manage-breast-screening.git`
126-
1. Enter the directory and select the branch, tag, commit...
127-
1. Login: `az login`
128-
1. Create the resource group: `make dev resource-group-init`. This is only required when creating the environment from scratch.
129-
1. Deploy:
130-
```shell
131-
make dev terraform-plan DOCKER_IMAGE_TAG=git-sha-af32637e7e6a07e36158dcb8d7ed90be49be1xyz
132-
```
133-
1. The web app URL will be displayed as output. Copy it into a browser on the AVD to access the app.
134-
135-
## Manual deployment of the review environments
136-
137-
Review environments differ slightly from other environments. They are lightweight versions of the application and are designed to share much of the core Azure infrastructure. As a result, there is a one-to-many relationship between the container apps and the container app environment.
138-
139-
### Step 1
140-
If you run the following command *without* the `PR_NUMBER` parameter, it will apply only the infrastructure module:
141-
142-
```shell
143-
make review terraform-apply
144-
```
145-
146-
### Step 2
147-
148-
If you include the `PR_NUMBER` parameter, it will apply the container_app module instead of the infrastructure module:
149-
150-
```shell
151-
make review terraform-apply DOCKER_IMAGE_TAG=git-sha-01ecb79d561f55be60072a093dd167fe8eb5b42e PR_NUMBER=123
152-
```
153-
154-
## Continuous deployment
155-
156-
When a PR is merged, Github actions securely triggers the deployment pipeline on the Azure devops pool running on the internal network. It currently deploys the dev environment automatically.
157-
158-
Access [Azure devops](https://dev.azure.com/nhse-dtos/dtos-manage-breast-screening/_build?definitionId=86) to see the pipeline.
115+
See [Deployment](docs/infrastructure/deployment.md).
159116

160117
## Application secrets
161118

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Create an environment
2+
3+
This is the initial manual process to create a new environment like review, dev, production...
4+
- Create the configuration files in `infrastructure/environments/[environment]`
5+
- Create postgres Entra ID group in DTOS Administrative Unit (AU): `postgres_manbrs_[environment]_uks_admin`
6+
- Ask CCOE to assign role:
7+
- [Form for PIM](https://nhsdigitallive.service-now.com/nhs_digital?id=sc_cat_item&sys_id=28f3ab4f1bf3ca1078ac4337b04bcb78&sysparm_category=114fced51bdae1502eee65b9bd4bcbdc)
8+
- Approver: Add someone from the infrastructure team
9+
- Role Name: `Group.Read.All`
10+
- Application Name: `mi-manbrs-[environment]-adotoaz-uks`
11+
- Application ID: [client.id]
12+
- Managed identity: `mi-manbrs-[environment]-adotoaz-uks`
13+
- Description:
14+
- Managed identity: `mi-manbrs-[environment]-adotoaz-uks`
15+
- Role: permanent on Directory
16+
- Run bicep from AVD: `make [environment] resource-group-init`
17+
- Create ADO group
18+
- Name: `Run pipeline - [environment]`
19+
- Members: `mi-manbrs-[environment]-ghtoado-uks`. There may be more than 1 in the list. Check client id printed below the name.
20+
- Permissions:
21+
- View project-level information
22+
- Create new pipeline:
23+
- Name: `Deploy to Azure - [environment]`
24+
- Pipeline yaml: `.azuredevops/pipelines/deploy.yml`
25+
- Manage pipeline security:
26+
- Add group: `Run pipeline - [environment]`
27+
- Permissions:
28+
- Edit queue build configuration
29+
- Queue builds
30+
- View build pipeline
31+
- View builds
32+
- Create ADO environment: [environment]
33+
- Set: exclusive lock (except for review)
34+
- Add pipeline permission for `Deploy to Azure - [environment]` pipeline
35+
- Create Github environment [environment]
36+
- Add environment secrets, from `mi-manbrs-[environment]-ghtoado-uks` in github
37+
- AZURE_CLIENT_ID
38+
- AZURE_SUBSCRIPTION_ID
39+
[TODO]- Add branch protection rule so only the main branch can be deployed (except for review) [TODO]
40+
- Create service connection (ADO)
41+
- Connection type: `Azure Resource Manager`
42+
- Identity type: `Managed identity`
43+
- Subscription for managed identity: `Digital Screening DToS - Devops`
44+
- Resource group for managed identity: `rg-mi-[environment]-uks`
45+
- Managed identity: `mi-manbrs-[environment]-adotoaz-uks`
46+
- Scope level: `Subscription`
47+
- Subscription: `Digital Screening DToS - Core Services Dev`
48+
- Resource group for Service connection: leave blank
49+
- Service Connection Name: `manbrs-[environment]`
50+
- Do NOT tick: Grant access permission to all pipelines
51+
- Security: allow `Deploy to Azure - [environment]` pipeline
52+
- Add environment to the list of environments in `deploy-stage` step of `cicd-2-main-branch.yaml`. For the review enviornment, there is a single item in `cicd-1-pull-request.yaml`.
53+
- Run Github workflow
54+
- Check ADO pipeline. You may be prompted to authorise:
55+
- Pipeline: service connection
56+
- Environment: service connection and agent pool

docs/infrastructure/deployment.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Deployment
2+
3+
## Infrastructure
4+
The code is packaged into a docker image which is deployed to [Azure container apps](https://learn.microsoft.com/en-us/azure/container-apps/). The main app is a web application, with an HTTP ingress. And the second one is an [Azure container app job](https://learn.microsoft.com/en-us/azure/container-apps/jobs?tabs=azure-cli), triggered on demand to run the database migration.
5+
6+
The web application does not have a public endpoint. It is only accessible via [Azure front door](https://learn.microsoft.com/en-us/azure/frontdoor/) which is a CDN providing TLS certificates, firewall, scaling and caching. The internal endpoint is accessible via [Azure Virtual Desktop](https://learn.microsoft.com/en-us/azure/virtual-desktop/).
7+
8+
The data is hosted on [Azure postgres flexible server](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/overview).
9+
10+
## Docker build
11+
The build pipeline builds and pushes a docker image to [Github container registry](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). The image is tagged with:
12+
- branch name: for docker build caching
13+
- commit SHA: to uniquely identify the image during deployment, prefixed by "git-sha-".
14+
- image digest sha: immutable tag
15+
16+
## Automated deployment
17+
The deployment is split between:
18+
- [Github actions](https://github.com/features/actions) for Continuous Integration (CI)
19+
- [Azure devops](https://azure.microsoft.com/en-us/products/devops) pipelines for Continuous Deployment (CD)
20+
21+
### Github actions
22+
Runs on Github hosted runners on the internet. They run all our tests (unit, functional, security, linting...). They don't have access to our internal network nor any sensitive data.
23+
24+
To deploy an environment, they authenticate to Azure and delegate the work to [Azure devops piplines](#azure-devops-pipelines).
25+
26+
See [all Github actions](https://github.com/NHSDigital/dtos-manage-breast-screening/actions).
27+
28+
### Azure devops pipelines
29+
We use a public repository as required by the [NHS Service standard](https://service-manual.nhs.uk/standards-and-technology/service-standard-points/12-make-new-source-code-open). For security reasons, deployments cannot run from Github actions and run instead on Azure devops private runners inside our internal network. They have access to the network and any Azure resource deployed onto it.
30+
31+
See [all Azure devops pipelines](https://dev.azure.com/nhse-dtos/dtos-manage-breast-screening/_build).
32+
33+
### Pull request
34+
When a pull request is raised, add a "deploy" label to deploy a review app (concept borrowed from [Heroku](https://devcenter.heroku.com/articles/github-integration-review-apps)). It triggers the [CI/CD pull request](https://github.com/NHSDigital/dtos-manage-breast-screening/actions/workflows/cicd-1-pull-request.yaml) Github action workflow, which runs tests then authenticates to Azure and triggers the [Deploy review app](https://dev.azure.com/nhse-dtos/dtos-manage-breast-screening/_build?definitionId=102) Azure devops pipeline. It runs terraform to deploy the application, database and front door configuration.
35+
36+
To make this process faster and less costly, most of the infrastructure is reused for all review apps: networking, key vaults, container app environments... The base infrastructure is only updated by the pipeline on the main branch.
37+
38+
When the pull request is closed or merged, and if it has the "deploy" label, the [Delete review app](https://github.com/NHSDigital/dtos-manage-breast-screening/actions/workflows/cicd-1-pull-request-closed.yaml) workflow is triggered, followed by the [Delete review app](https://dev.azure.com/nhse-dtos/dtos-manage-breast-screening/_build?definitionId=103) Azure devops pipeline. It runs *terraform destroy* to delete the resources.
39+
40+
Note: terraform currently deploys a postgres server with a locked database. It must be deleted manually from the Azure portal before the pipeline runs.
41+
42+
### Main branch
43+
When a pull request is merged to the main branch, the [CI/CD main branch](https://github.com/NHSDigital/dtos-manage-breast-screening/actions/workflows/cicd-2-main-branch.yaml) is triggered. It runs tests then authenticates to Azure and triggers the [Deploy to Azure](https://dev.azure.com/nhse-dtos/dtos-manage-breast-screening/_build?definitionId=93) Azure devops pipeline. It runs terraform to deploy the entire environment, including both infrastructure and applications. Any manual change is overwritten by terraform.
44+
45+
## Manual deployment
46+
For each environment, e.g. 'dev':
47+
48+
1. Connect to [Azure virtual desktop](https://azure.microsoft.com/en-us/products/virtual-desktop). Ask the platform team for access with Administrator role.
49+
1. If not present, install the following software: terraform (version 1.7.0), git, make, jq.
50+
- Run a Command prompt as administrator
51+
- choco install terraform --version 1.7.0
52+
- choco install terraform git make jq
53+
1. Open git bash
54+
1. Clone the repository: `git clone https://github.com/NHSDigital/dtos-manage-breast-screening.git`
55+
1. Enter the directory and select the branch, tag, commit...
56+
1. Login: `az login`
57+
1. Create the resource group: `make dev resource-group-init`. This is only required when creating the environment from scratch.
58+
1. Deploy:
59+
```shell
60+
make dev terraform-plan DOCKER_IMAGE_TAG=git-sha-af32637e7e6a07e36158dcb8d7ed90be49be1xyz
61+
```
62+
1. The web app URL will be displayed as output. Copy it into a browser on the AVD to access the app.
63+
64+
## Manual deployment of the review environments
65+
66+
Review environments differ slightly from other environments. They are lightweight versions of the application and are designed to share much of the core Azure infrastructure. As a result, there is a one-to-many relationship between the container apps and the container app environment.
67+
68+
### Step 1
69+
If you run the following command *without* the `PR_NUMBER` parameter, it will apply only the infrastructure module:
70+
71+
```shell
72+
make review terraform-apply
73+
```
74+
75+
### Step 2
76+
77+
If you include the `PR_NUMBER` parameter, it will apply the container_app module instead of the infrastructure module:
78+
79+
```shell
80+
make review terraform-apply DOCKER_IMAGE_TAG=git-sha-01ecb79d561f55be60072a093dd167fe8eb5b42e PR_NUMBER=123
81+
```

docs/infrastructure/infra-faq.md

Lines changed: 112 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,12 @@
1-
## Import into terraform state file
1+
# Infra FAQ
2+
3+
- [Terraform](#terraform)
4+
- [Github action triggering Azure devops pipeline](#github-action-triggering-azure-devops-pipeline)
5+
- [Bicep errors](#bicep-errors)
6+
7+
8+
## Terraform
9+
### Import into terraform state file
210

311
To import Azure resources into the Terraform state file, you can use the following command. If you're working on an AVD machine, you may need to set the environment variables:
412
- `ARM_USE_AZUREAD` to use Azure AD instead of a shared key
@@ -13,7 +21,7 @@ export MSYS_NO_PATHCONV=true
1321
terraform -chdir=infrastructure/terraform import -var-file ../environments/${ENV_CONFIG}/variables.tfvars module.infra[0].module.postgres_subnet.azurerm_subnet.subnet /subscriptions/xxx/resourceGroups/rg-manbrs-review-uks/providers/Microsoft.Network/virtualNetworks/vnet-review-uks-manbrs/subnets/snet-postgres
1422
```
1523

16-
## Error: Failed to load state
24+
### Error: Failed to load state
1725
This happens when running terraform commands accessing the state file like [import](#import-into-terraform-state-file), `state list` or `force-unlock`.
1826
```
1927
Failed to load state: blobs.Client#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="KeyBasedAuthenticationNotPermitted" Message="Key based authentication is not permitted on this storage account.
@@ -24,3 +32,105 @@ By default terraform tries using a shared key, which is not allowed. To force us
2432
```shell
2533
ARM_USE_AZUREAD=true terraform force-unlock xxx-yyy
2634
```
35+
36+
## Github action triggering Azure devops pipeline
37+
### Application with identifier '***' was not found in the directory
38+
Example:
39+
```
40+
Running Azure CLI Login.
41+
...
42+
Attempting Azure CLI login by using OIDC...
43+
Error: AADSTS700016: Application with identifier '***' was not found in the directory 'NHS Strategic Tenant'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: xxx Correlation ID: xxx Timestamp: xxx
44+
45+
Error: Interactive authentication is needed. Please run:
46+
az login
47+
```
48+
The managed identity does not exist or Github secrets are not set correctly
49+
50+
### The client '***' has no configured federated identity credentials
51+
Example:
52+
```
53+
Running Azure CLI Login.
54+
...
55+
Attempting Azure CLI login by using OIDC...
56+
Error: AADSTS70025: The client '***'(mi-manbrs-ado-review-temp) has no configured federated identity credentials. Trace ID: xxx Correlation ID: xxx Timestamp: xxx
57+
58+
Error: Interactive authentication is needed. Please run:
59+
az login
60+
```
61+
Federated credentials are not configured.
62+
63+
### No subscriptions found for ***
64+
Example:
65+
```
66+
Running Azure CLI Login.
67+
...
68+
Attempting Azure CLI login by using OIDC...
69+
Error: No subscriptions found for ***.
70+
```
71+
Give the managed identity Reader role on a subscription (normally Devops)
72+
73+
### Pipeline permissions
74+
Examples:
75+
```
76+
ERROR: TF401444: Please sign-in at least once as ***\***\xxx in a web browser to enable access to the service.
77+
Error: Process completed with exit code 1.
78+
```
79+
Or
80+
```
81+
ERROR: TF400813: The user 'xxx' is not authorized to access this resource.
82+
Error: Process completed with exit code 1.
83+
```
84+
Or
85+
```
86+
ERROR: VS800075: The project with id 'vstfs:///Classification/TeamProject/' does not exist, or you do not have permission to access it.
87+
Error: Process completed with exit code 1.
88+
```
89+
The Github secret must reflect the right managed identity, the managed identity must have the following permissions on the pipeline, via its ADO group:
90+
- Edit queue build configuration
91+
- Queue builds
92+
- View build pipeline
93+
94+
The ADO group must have the "View project-level information" permission.
95+
96+
### The service connection does not exist
97+
Example:
98+
```
99+
The pipeline is not valid. Job DeployApp: Step input azureSubscription references service connection manbrs-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Job DeployApp: Step input azureSubscription references service connection manbrs-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz.
100+
```
101+
The Azure service connection manbrs-[environment] is missing
102+
103+
## Bicep errors
104+
### RoleAssignmentUpdateNotPermitted
105+
Example:
106+
```
107+
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/xxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenprincipal ID, and scope are not allowed to be updated."},{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."},{"cteNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."}]}}
108+
```
109+
When deleting a MI, its role assignment is not deleted. When recreating the MI, bicep tries to update the role assignment and is not allowed to. Solution:
110+
- Find the role assignment id. Here: abcd-123
111+
- Navigate to subscriptions and resource group IAM and search for the role assignment id
112+
- Delete the role assignment via the portal
113+
114+
If you can't find the right scope, follow this process:
115+
- Find the role assignment id. Here: abcd-123
116+
```
117+
 ~ Microsoft.Authorization/roleAssignments/abcd-123 [2022-04-01]
118+
    ~ properties.principalId: "xxx" => "[reference('/subscriptions/xxx/resourceGroups/rg-mi-review-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-manbrs-ado-review-uks', '2024-11-30').principalId]"
119+
```
120+
- Get the subscription id
121+
- List role assignments: `az role assignment list --scope "/subscriptions/[subscription id]"`
122+
- Look for the role assignment id abcd-123 to retrieve the other details. It may named: Unknown.
123+
- Delete the role assignment via the portal
124+
125+
### PrincipalNotFound
126+
Example:
127+
```
128+
ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/exxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"PrincipalNotFound","message":"Principal xxx does not exist in the directory xxx. Check that you have the correct principal ID. If you are creating this principal and then immediately assigning a role, this era replication delay. In this case, set the role assignment principalType property to a value, such as ServicePrincipal, User, or Group.  See https://aka.ms/docs-principaltype"}...
129+
```
130+
Race condition: the managed identity is not created in time for the resources that depend on it. Solution: rerun the command.
131+
132+
### The client does not have permission
133+
```
134+
{"code": "InvalidTemplateDeployment", "message": "Deployment failed with multiple errors: 'Authorization failed for template resource 'xxx' of type 'Microsoft.Authorization/roleAssignments'. The client 'xxx' with object id 'xxx' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/xxx/providers/Microsoft.Authorization/roleAssignments/xxx'...
135+
```
136+
Request Owner role on subscriptions via PIM.

0 commit comments

Comments
 (0)