generated from NHSDigital/repository-template
-
Notifications
You must be signed in to change notification settings - Fork 1
Ppha 417 create infra as code for hub resources #200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mrlockstar
wants to merge
2
commits into
main
Choose a base branch
from
PPHA-417-Create-infra-as-code-for-Hub-resources-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,277 @@ | ||
| # Infra FAQ | ||
|
|
||
| - [Terraform](#terraform) | ||
|
|
||
| - [GitHub action triggering Azure devops pipeline](#github-action-triggering-azure-devops-pipeline) | ||
| - [Bicep errors](#bicep-errors) | ||
| - [Front door](#front-door) | ||
| - [Smoke Testing](#smoke-testing) | ||
|
|
||
| ## Terraform | ||
|
|
||
| ### Import into terraform state file | ||
|
|
||
| To import Azure resources into the Terraform state file, you can use the following command. If you're working on an AVD machine, you may need to set the environment variables: | ||
|
|
||
| - `ARM_USE_AZUREAD` to use Azure AD instead of a shared key | ||
| - `MSYS_NO_PATHCONV` to stop git bash from expanding file paths | ||
|
|
||
| Below is an example of how to do it. | ||
|
|
||
| ```shell | ||
| export ARM_USE_AZUREAD=true | ||
| export MSYS_NO_PATHCONV=true | ||
|
|
||
| terraform -chdir=infrastructure/terraform import -var-file ../environments/${ENV_CONFIG}/variables.tfvars module.infra[0].module.postgres_subnet.azurerm_subnet.subnet /subscriptions/xxx/resourceGroups/rg-lungrc-review-uks/providers/Microsoft.Network/virtualNetworks/vnet-review-uks-lungrc/subnets/snet-postgres | ||
| ``` | ||
|
|
||
| ### Error: Failed to load state | ||
|
|
||
| This happens when running terraform commands accessing the state file like [import](#import-into-terraform-state-file), `state list` or `force-unlock`. | ||
|
|
||
| ```shell | ||
| Failed to load state: blobs.Client#Get: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="KeyBasedAuthenticationNotPermitted" Message="Key based authentication is not permitted on this storage account. | ||
| ``` | ||
|
|
||
| By default terraform tries using a shared key, which is not allowed. To force using Entra ID, use `ARM_USE_AZUREAD`. | ||
|
|
||
| ```shell | ||
| ARM_USE_AZUREAD=true terraform force-unlock xxx-yyy | ||
| ``` | ||
|
|
||
| ## GitHub action triggering Azure devops pipeline | ||
|
|
||
| ### Application with identifier '\*\*\*' was not found in the directory | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| Running Azure CLI Login. | ||
| ... | ||
| Attempting Azure CLI login by using OIDC... | ||
| Error: AADSTS700016: Application with identifier '***' was not found in the directory 'NHS Strategic Tenant'. This can happen if the application has not been installed by the administrator of the tenant or consented to by any user in the tenant. You may have sent your authentication request to the wrong tenant. Trace ID: xxx Correlation ID: xxx Timestamp: xxx | ||
|
|
||
| Error: Interactive authentication is needed. Please run: | ||
| az login | ||
| ``` | ||
|
|
||
| The managed identity does not exist or GitHub secrets are not set correctly | ||
|
|
||
| ### The client '\*\*\*' has no configured federated identity credentials | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| Running Azure CLI Login. | ||
| ... | ||
| Attempting Azure CLI login by using OIDC... | ||
| Error: AADSTS70025: The client '***'(mi-lungrc-ado-review-temp) has no configured federated identity credentials. Trace ID: xxx Correlation ID: xxx Timestamp: xxx | ||
|
|
||
| Error: Interactive authentication is needed. Please run: | ||
| az login | ||
| ``` | ||
|
|
||
| Federated credentials are not configured. | ||
|
|
||
| ### No subscriptions found for \*\*\* | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| Running Azure CLI Login. | ||
| ... | ||
| Attempting Azure CLI login by using OIDC... | ||
| Error: No subscriptions found for ***. | ||
| ``` | ||
|
|
||
| Give the managed identity Reader role on a subscription (normally Devops) | ||
|
|
||
| ### Pipeline permissions | ||
|
|
||
| Examples: | ||
|
|
||
| ```shell | ||
| ERROR: TF401444: Please sign-in at least once as ***\***\xxx in a web browser to enable access to the service. | ||
| Error: Process completed with exit code 1. | ||
| ``` | ||
|
|
||
| Or | ||
|
|
||
| ```shell | ||
| ERROR: TF400813: The user 'xxx' is not authorized to access this resource. | ||
| Error: Process completed with exit code 1. | ||
| ``` | ||
|
|
||
| Or | ||
|
|
||
| ```shell | ||
| ERROR: VS800075: The project with id 'vstfs:///Classification/TeamProject/' does not exist, or you do not have permission to access it. | ||
| Error: Process completed with exit code 1. | ||
| ``` | ||
|
|
||
| The GitHub secret must reflect the right managed identity, the managed identity must have the following permissions on the pipeline, via its ADO group: | ||
|
|
||
| - Edit queue build configuration | ||
| - Queue builds | ||
| - View build pipeline | ||
|
|
||
| The ADO group must have the "View project-level information" permission. | ||
|
|
||
| ### The service connection does not exist | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| The pipeline is not valid. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. Job DeployApp: Step input azureSubscription references service connection lungrc-review which could not be found. The service connection does not exist, has been disabled or has not been authorized for use. For authorization details, refer to https://aka.ms/yamlauthz. | ||
| ``` | ||
|
|
||
| The Azure service connection lungrc-[environment] is missing | ||
|
|
||
| ## Bicep errors | ||
|
|
||
| ### RoleAssignmentUpdateNotPermitted | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/xxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenprincipal ID, and scope are not allowed to be updated."},{"code":"RoleAssignmentUpdateNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."},{"cteNotPermitted","message":"Tenant ID, application ID, principal ID, and scope are not allowed to be updated."}]}} | ||
| ``` | ||
|
|
||
| When deleting a MI, its role assignment is not deleted. When recreating the MI, bicep tries to update the role assignment and is not allowed to. Solution: | ||
|
|
||
| - Find the role assignment id. Here: abcd-123 | ||
| - Navigate to subscriptions and resource group IAM and search for the role assignment id | ||
| - Delete the role assignment via the portal | ||
|
|
||
| If you can't find the right scope, follow this process: | ||
|
|
||
| - Find the role assignment id. Here: abcd-123 | ||
|
|
||
| ```shell | ||
| ~ Microsoft.Authorization/roleAssignments/abcd-123 [2022-04-01] | ||
| ~ properties.principalId: "xxx" => "[reference('/subscriptions/xxx/resourceGroups/rg-mi-review-uks/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mi-lungrc-ado-review-uks', '2024-11-30').principalId]" | ||
| ``` | ||
|
|
||
| - Get the subscription id | ||
| - List role assignments: `az role assignment list --scope "/subscriptions/[subscription id]"` | ||
| - Look for the role assignment id abcd-123 to retrieve the other details. It may named: Unknown. | ||
| - Delete the role assignment via the portal | ||
|
|
||
| ### PrincipalNotFound | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| ERROR: {"status":"Failed","error":{"code":"DeploymentFailed","target":"/subscriptions/exxx/providers/Microsoft.Resources/deployments/main","message":"At least one reson failed. Please list deployment operations for details. Please see https://aka.ms/arm-deployment-operations for usage details.","details":[{"code":"PrincipalNotFound","message":"Principal xxx does not exist in the directory xxx. Check that you have the correct principal ID. If you are creating this principal and then immediately assigning a role, this era replication delay. In this case, set the role assignment principalType property to a value, such as ServicePrincipal, User, or Group. See https://aka.ms/docs-principaltype"}... | ||
| ``` | ||
|
|
||
| Race condition: the managed identity is not created in time for the resources that depend on it. Solution: rerun the command. | ||
|
|
||
| ### The client does not have permission | ||
|
|
||
| ```shell | ||
| {"code": "InvalidTemplateDeployment", "message": "Deployment failed with multiple errors: 'Authorization failed for template resource 'xxx' of type 'Microsoft.Authorization/roleAssignments'. The client 'xxx' with object id 'xxx' does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write' at scope '/subscriptions/xxx/providers/Microsoft.Authorization/roleAssignments/xxx'... | ||
| ``` | ||
|
|
||
| Request Owner role on subscriptions via PIM. | ||
|
|
||
| ## Front door | ||
|
|
||
| ### Error 504 | ||
|
|
||
| When an environment is freshly created, accessing the app via front door may result in a blank page and 504 HTTP error. | ||
|
|
||
| This is because the private link between front door and the container app environment must be manually approved: | ||
|
|
||
| - Navigate to the container app environment, Settings, Networking, Private Endpoints | ||
| - It should show "1 Private Endpoint". Click on it. | ||
| - You should see a connection with Connection State = "Pending" | ||
| - Click on the connection name (a long ID in black, not the blue private endpoint link) | ||
| - Click "✔️ Approve" at the top | ||
| - Wait a few minutes until Connection State shows Approved | ||
|
|
||
| ### Private link not created | ||
|
|
||
| When an origin is created, it must create a unique private link between front door and the container app environment. The private link automatically creates a private endpoint associated with the container app environment. When more origins are added, the same link is used. | ||
|
|
||
| If the private endpoint is deleted, for example if container app environment is deleted, the private link is gone and the origins are silently orphans. When the container app environment is recreated, even if the apps and origins are redeployed, azure will not recreate the private link. | ||
|
|
||
| All the deployed apps show a blank page and 504 HTTP error. | ||
|
|
||
| The solution is to delete all the origins to this particular container app environment. Then when the first origin is re-added, the private link will be created. Recreate the other origins and they will use the same link. | ||
|
|
||
| ### Unable to write state file to blob storage | ||
|
|
||
| When initially creating the terraform; the pipeline will try to create a state file on the blob storage. Sometimes you will get an error like this: - | ||
|
|
||
| Example: | ||
|
|
||
| ```shell | ||
| Failed to get existing workspaces: containers.Client#ListBlobs: Failure sending request: StatusCode=0 -- Original Error: Get "https://salungrcpreprodtfstate.blob.core.windows.net/terraform-state?comp=list&prefix=preprod.tfstateenv%3A&restype=container": dial tcp: lookup salungrcpreprodtfstate.blob.core.windows.net on *.*.*.*:53: no such host | ||
| ``` | ||
|
|
||
| You can check to see if the blobstorage is accessible via logging into the VDI machine and trying to do an nslookup on the blob storage account: - | ||
|
|
||
| ```shell | ||
| $ nslookup salungrcpreprodtfstate.blob.core.windows.net | ||
| Server: UnKnown | ||
| Address: _._._._ | ||
|
|
||
| Non-authoritative answer: | ||
| Name: salungrcpreprodtfstate.privatelink.blob.core.windows.net | ||
| Address: _._._._ | ||
| Aliases: salungrcpreprodtfstate.blob.core.windows.net | ||
| ``` | ||
|
|
||
| In the above example it was discoverd that the pipeline pool was on the wrong ADO management pool, i.e on the private-pool-dev-uks instead of the private-pool-prod-uks. | ||
|
|
||
| ## Smoke Testing | ||
|
|
||
| ### Smoke test failing with 404 or timeout | ||
|
|
||
| The smoke test verifies the deployed application is accessible and serving the correct version. | ||
|
|
||
| **Common causes:** | ||
|
|
||
| 1. **Apex domain misconfiguration** | ||
| - Production uses apex domain (`manage-breast-screening.nhs.uk`) | ||
| - Other environments use subdomain (`{env}.manage-breast-screening.nhs.uk`) | ||
| - Ensure `use_apex_domain = true` is set in `infrastructure/environments/prod/variables.tfvars` | ||
|
|
||
| 2. **Front Door not approved** | ||
| - See [Error 504](#error-504) for private link approval steps | ||
|
|
||
| 3. **Container app not ready** | ||
| - The test waits up to 5 minutes for the app to become available | ||
| - Check container app logs in Azure Portal | ||
|
|
||
| 4. **Wrong SHA deployed** | ||
| - Verify the correct docker image tag was used in deployment | ||
| - Check the `/sha` endpoint manually from AVD | ||
|
|
||
| **Script location:** `scripts/bash/container_app_smoke_test.sh` | ||
|
|
||
| ### InsufficientCoreQuota | ||
|
|
||
| InsufficientCoreQuota | ||
| Cores needed: 4 | ||
| Current limit: 0 | ||
| SKU family: standardDSv4Family | ||
| Region: uksouth | ||
|
|
||
| This means: | ||
| Your subscription currently has ZERO cores approved for DSv4 VMs in UK South | ||
| Managed DevOps Pools try to allocate 4 cores minimum | ||
| Azure blocks the request before any VM is created | ||
| This is quota, not permissions, not config, not DevOps. | ||
|
|
||
| Request quota (correct long-term fix) | ||
| Follow the link Azure gave you (this is the right one): | ||
|
|
||
| [Azure Portal](https://portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV3Blade/issueType/quota/%E2%80%A6) | ||
|
|
||
| Request: | ||
| Region: UK South | ||
| SKU family: Standard DSv4 | ||
| Requested cores: at least 8 (don’t ask for 4 — ask for headroom) | ||
| Reason: “Azure DevOps Managed DevOps Pool – build agents” |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # Subscription Quota requirements | ||
|
|
||
| New subscription that are created within Azure often have limitations on them, there are several steps needed to help avoid deployment problems. | ||
|
|
||
| ## Step 1 | ||
|
|
||
| Resource providers components need to be enabled:- | ||
|
|
||
| - Microsoft.Authorization | ||
| - Microsoft.AzureTerraform | ||
| - Microsoft.Billing | ||
| - Microsoft.ChangeSafety | ||
| - Microsoft.ClassicSubscription | ||
| - Microsoft.Commerce | ||
| - Microsoft.Compute | ||
| - Microsoft.ComputeSchedule | ||
| - Microsoft.Consumption | ||
| - Microsoft.ContainerService | ||
| - Microsoft.CostManagement | ||
| - Microsoft.DesktopVirtualization | ||
| - Microsoft.DevCenter | ||
| - Microsoft.DevOpsInfrastructure | ||
| - Microsoft.Diagnostics | ||
| - Microsoft.Features | ||
| - Microsoft.GuestConfiguration | ||
| - Microsoft.Insights | ||
| - Microsoft.KeyVault | ||
| - Microsoft.ManagedIdentity | ||
| - Microsoft.MarketplaceOrdering | ||
| - Microsoft.Network | ||
| - Microsoft.PolicyInsights | ||
| - Microsoft.Portal | ||
| - Microsoft.Quota | ||
| - Microsoft.ResourceGraph | ||
| - Microsoft.ResourceIntelligence | ||
| - Microsoft.ResourceNotifications | ||
| - Microsoft.Resources | ||
| - Microsoft.Security | ||
| - Microsoft.SerialConsole | ||
| - Microsoft.Storage | ||
| - Microsoft.Support | ||
|
|
||
| ## Step 2 | ||
|
|
||
| The following quotas need to be increased, raise a support ticket with Azure support to get these increased. This list used for the all the new subscriptions. Microsoft will likely need a business justification for the increase in quota, as of the time of writing this, but that will likely not be the case in the future. | ||
|
|
||
| | Subscription Name | Subscription ID | Environment | Region | Alternative Region | Specify AZ | AZ / Zonal Deployment Notes | Azure Service | SKU | Alternative SKU | Unit | Oct-25 | Nov-25 | Dec-25 | Jan-26 | Feb-26 | Mar-26 | | ||
| |--------------------------------------|-------------------------------------------|-------------|----------|--------------------|------------|-----------------------------|---------------|--------------------|------------------|-------|--------|--------|--------|--------|--------|--------| | ||
| | Lung Cancer Risk Check - Non-live hub | ****** | Non Live | UK South | N/A | N/A | Regional deployment | Compute | Standard_D2ads_v5 | N/A | Units | 4 | 4 | 4 | 4 | 4 | 4 | | ||
| | Lung Cancer Risk Check - Live hub | ****** | Live | UK South | N/A | N/A | Regional deployment | Compute | Standard_D2ads_v5 | N/A | Units | 4 | 4 | 4 | 4 | 4 | 4 | | ||
| | Lung Cancer Risk Check - Dev | ****** | Dev | UK South | N/A | N/A | Regional deployment | Compute | B_Standard_B1ms | N/A | Units | 1 | 1 | 1 | 1 | 1 | 1 | | ||
| | Lung Cancer Risk Check - Review | ****** | Review | UK South | N/A | N/A | Regional deployment | Compute | B_Standard_B1ms | N/A | Units | 1 | 1 | 1 | 1 | 1 | 1 | | ||
| | Lung Cancer Risk Check - Prod | ****** | Prod | UK South | N/A | N/A | Regional deployment | Compute | GP_Standard_D2ds_v5| N/A | Units | 3 | 3 | 3 | 3 | 3 | 3 | | ||
| | Lung Cancer Risk Check - Preprod | ****** | Preprod | UK South | N/A | N/A | Regional deployment | Compute | GP_Standard_D2ds_v5| N/A | Units | 3 | 3 | 3 | 3 | 3 | 3 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,3 @@ | ||
| AZURE_SUBSCRIPTION="Digital Screening DToS - Sandbox" | ||
| AZURE_SUBSCRIPTION="Lung Cancer Risk Check - Non-live hub" | ||
| BOOTSTRAP=hub | ||
| HUB_TYPE=nonlive |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take this out