Skip to content

Commit 3a17d47

Browse files
committed
DTOSS-11302: Application health check
To ensure that the application is avaiable we need to turn on the application health check on application insights. This involves Creating a health check restful endpoint on both web app container. Creating the monitor on application insights that tests that the web app container is up.
1 parent fc626f2 commit 3a17d47

File tree

7 files changed

+239
-6
lines changed

7 files changed

+239
-6
lines changed

infrastructure/modules/app-service-plan/variables.tf

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,9 +203,9 @@ variable "alert_cpu_threshold" {
203203
}
204204

205205
variable "alert_memory_threshold" {
206-
type = number
206+
type = number
207207
description = "If alerting is enabled this will control what the memory threshold will be, default will be 80."
208-
default = 80
208+
default = 80
209209
}
210210

211211
variable "alert_window_size" {
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Application Insights availability test
2+
3+
Deploy an [Azure application insights avaliability test](https://learn.microsoft.com/en-us/azure/azure-monitor/app/availability?tabs=standard) to continuously monitor the accessibility of a web endpoint. The test periodically sends requests to a specified URL from multiple Azure locations and records response times, status codes, and any failures.
4+
5+
The availability test runs automatically at the configured frequency once deployed and provides visibility into uptime, latency, and dependency issues through [Application insights](https://learn.microsoft.com/en-us/azure/azure-monitor/app/usage?tabs=users) and Azure Monitor.
6+
Integrates with the module.
7+
8+
See also the following ADO template step that can be used to verify endpoint health as part of a release pipeline: app-insights-availability-check.yaml.
9+
10+
11+
## Terraform documentation
12+
For the list of inputs, outputs, resources... check the [terraform module documentation](tfdocs.md).
13+
14+
## Usage
15+
Create the application Insights availability test:
16+
```hcl
17+
module "azurerm_application_insights_standard_web_test" {
18+
source = "../dtos-devops-templates/infrastructure/modules/application-insights-availability-test"
19+
20+
name = "${var.app_short_name}-web-${var.environment}"
21+
resource_group_name = var.resource_group_name_infra
22+
location = var.location
23+
application_insights_id = data.azurerm_log_analytics_workspace.audit[0].id
24+
target_url = "${module.container-apps[0].external_url}healthcheck/"
25+
}
26+
```
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
resource "azurerm_application_insights_standard_web_test" "this" {
2+
name = var.name
3+
4+
location = var.location
5+
resource_group_name = var.resource_group_name
6+
application_insights_id = var.application_insights_id
7+
8+
frequency = var.frequency
9+
timeout = var.timeout
10+
11+
request {
12+
url = var.target_url
13+
}
14+
15+
geo_locations = var.geo_locations
16+
}
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Module documentation
2+
3+
## Required Inputs
4+
5+
The following input variables are required:
6+
7+
### <a name="input_application_insights_id"></a> [application\_insights\_id](#input\_application\_insights\_id)
8+
9+
Description: The Application Insights resource id to associate the availability test with
10+
11+
Type: `string`
12+
13+
### <a name="input_name"></a> [name](#input\_name)
14+
15+
Description: Name of the availability test, must be unique for the used application insights instance
16+
17+
Type: `string`
18+
19+
### <a name="input_resource_group_name"></a> [resource\_group\_name](#input\_resource\_group\_name)
20+
21+
Description: The name of the resource group in which to create the availability test.
22+
23+
Type: `string`
24+
25+
## Optional Inputs
26+
27+
The following input variables are optional (have default values):
28+
29+
### <a name="input_frequency"></a> [frequency](#input\_frequency)
30+
31+
Description: Frequency of test in seconds
32+
33+
Type: `number`
34+
35+
Default: `300`
36+
37+
### <a name="input_geo_locations"></a> [geo\_locations](#input\_geo\_locations)
38+
39+
Description: List of Azure test locations (provider-specific location strings for UK and Ireland)
40+
41+
Type: `list(string)`
42+
43+
Default:
44+
45+
```json
46+
[
47+
"emea-ru-msa-edge",
48+
"emea-se-sto-edge",
49+
"emea-gb-db3-azr"
50+
]
51+
```
52+
53+
### <a name="input_location"></a> [location](#input\_location)
54+
55+
Description: The location/region where the availability test is deployed (must match App Insights location)
56+
57+
Type: `string`
58+
59+
Default: `"UK South"`
60+
61+
### <a name="input_target_url"></a> [target\_url](#input\_target\_url)
62+
63+
Description: The target URL for the restful endpoint to hit to validate the application is avaliable
64+
65+
Type: `string`
66+
67+
Default: `""`
68+
69+
### <a name="input_timeout"></a> [timeout](#input\_timeout)
70+
71+
Description: Timeout in seconds
72+
73+
Type: `number`
74+
75+
Default: `30`
76+
77+
78+
## Resources
79+
80+
The following resources are used by this module:
81+
82+
- [azurerm_application_insights_standard_web_test.this](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/application_insights_standard_web_test) (resource)
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
variable "name" {
2+
type = string
3+
description = "Name of the availability test, must be unique for the used application insights instance"
4+
}
5+
6+
variable "resource_group_name" {
7+
description = "The name of the resource group in which to create the availability test."
8+
type = string
9+
validation {
10+
condition = can(regex("^[a-zA-Z0-9-]{1,255}$", var.resource_group_name))
11+
error_message = "The resource group name must be between 1 and 255 characters and can only contain alphanumeric characters, hyphens, and underscores."
12+
}
13+
}
14+
15+
variable "location" {
16+
type = string
17+
description = "The location/region where the availability test is deployed (must match App Insights location)"
18+
default = "UK South"
19+
}
20+
21+
variable "application_insights_id" {
22+
type = string
23+
description = "The Application Insights resource id to associate the availability test with"
24+
}
25+
26+
variable "frequency" {
27+
type = number
28+
default = 300
29+
validation {
30+
condition = contains(["300", "600", "900"], var.frequency)
31+
error_message = "The frequency must be one of: 300, 600 or 900"
32+
}
33+
description = "Frequency of test in seconds, defaults to 300."
34+
}
35+
36+
variable "timeout" {
37+
type = number
38+
default = 30
39+
description = "Timeout in seconds, defaults to 30."
40+
}
41+
42+
variable "geo_locations" {
43+
type = list(string)
44+
default = ["emea-ru-msa-edge", "emea-se-sto-edge", "emea-gb-db3-azr"]
45+
description = "List of Azure test locations (provider-specific location strings for UK and Ireland)"
46+
}
47+
48+
variable "target_url" {
49+
type = string
50+
description = "The target URL for the restful endpoint to hit to validate the application is available"
51+
}

infrastructure/modules/function-app/alerts.tf

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ resource "azurerm_monitor_metric_alert" "function_4xx" {
44

55
name = "${azurerm_linux_function_app.function_app.name}-4xx-errors"
66
resource_group_name = var.resource_group_name_monitoring != null ? var.resource_group_name_monitoring : var.resource_group_name
7-
scopes = [azurerm_linux_function_app.function_app.id] # Point to your function app
7+
scopes = [azurerm_linux_function_app.function_app.id] # Point to your function app
88
description = "Action will be triggered when 4xx errors exceed ${var.alert_4xx_threshold}"
99
window_size = var.alert_window_size
1010
frequency = local.alert_frequency
@@ -13,7 +13,7 @@ resource "azurerm_monitor_metric_alert" "function_4xx" {
1313
criteria {
1414
metric_namespace = "Microsoft.Web/sites"
1515
metric_name = "Http4xx"
16-
aggregation = "Total" # Count total 4xx errors
16+
aggregation = "Total" # Count total 4xx errors
1717
operator = "GreaterThan"
1818
threshold = var.alert_4xx_threshold
1919
}
@@ -35,7 +35,7 @@ resource "azurerm_monitor_metric_alert" "function_5xx" {
3535

3636
name = "${azurerm_linux_function_app.function_app.name}-5xx-errors"
3737
resource_group_name = var.resource_group_name_monitoring != null ? var.resource_group_name_monitoring : var.resource_group_name
38-
scopes = [azurerm_linux_function_app.function_app.id] # Point to your function app
38+
scopes = [azurerm_linux_function_app.function_app.id] # Point to your function app
3939
description = "Action will be triggered when 5xx errors exceed ${var.alert_5xx_threshold}"
4040
window_size = var.alert_window_size
4141
frequency = local.alert_frequency
@@ -44,7 +44,7 @@ resource "azurerm_monitor_metric_alert" "function_5xx" {
4444
criteria {
4545
metric_namespace = "Microsoft.Web/sites"
4646
metric_name = "Http5xx"
47-
aggregation = "Total" # Count total 5xx errors
47+
aggregation = "Total" # Count total 5xx errors
4848
operator = "GreaterThan"
4949
threshold = var.alert_5xx_threshold
5050
}

infrastructure/modules/function-app/tfdocs.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,38 @@ Type: `any`
122122

123123
Default: `null`
124124

125+
### <a name="input_action_group_id"></a> [action\_group\_id](#input\_action\_group\_id)
126+
127+
Description: The ID of the Action Group to use for alerts.
128+
129+
Type: `string`
130+
131+
Default: `null`
132+
133+
### <a name="input_alert_4xx_threshold"></a> [alert\_4xx\_threshold](#input\_alert\_4xx\_threshold)
134+
135+
Description: The threshold for 4xx errors to trigger the alert.
136+
137+
Type: `number`
138+
139+
Default: `10`
140+
141+
### <a name="input_alert_5xx_threshold"></a> [alert\_5xx\_threshold](#input\_alert\_5xx\_threshold)
142+
143+
Description: The threshold for 4xx errors to trigger the alert.
144+
145+
Type: `number`
146+
147+
Default: `10`
148+
149+
### <a name="input_alert_window_size"></a> [alert\_window\_size](#input\_alert\_window\_size)
150+
151+
Description: The period of time that is used to monitor alert activity e.g. PT1M, PT5M, PT15M, PT30M, PT1H, PT6H, PT12H. The interval between checks is adjusted accordingly.
152+
153+
Type: `string`
154+
155+
Default: `"PT5M"`
156+
125157
### <a name="input_always_on"></a> [always\_on](#input\_always\_on)
126158

127159
Description: Should the Function App be always on. Override standard default.
@@ -168,6 +200,14 @@ Default:
168200
]
169201
```
170202

203+
### <a name="input_enable_alerting"></a> [enable\_alerting](#input\_enable\_alerting)
204+
205+
Description: Whether monitoring and alerting is enabled for the App Service Plan.
206+
207+
Type: `bool`
208+
209+
Default: `false`
210+
171211
### <a name="input_entra_id_group_ids"></a> [entra\_id\_group\_ids](#input\_entra\_id\_group\_ids)
172212

173213
Description: n/a
@@ -311,6 +351,22 @@ list(object({
311351

312352
Default: `[]`
313353

354+
### <a name="input_resource_group_name_monitoring"></a> [resource\_group\_name\_monitoring](#input\_resource\_group\_name\_monitoring)
355+
356+
Description: The name of the resource group in which to create the Monitoring resources for the App Service Plan. Changing this forces a new resource to be created.
357+
358+
Type: `string`
359+
360+
Default: `null`
361+
362+
### <a name="input_severity"></a> [severity](#input\_severity)
363+
364+
Description: Severity of the alert. 0 = Critical, 1 = Error, 2 = Warning, 3 = Informational, 4 = Verbose. Default is 3.
365+
366+
Type: `number`
367+
368+
Default: `3`
369+
314370
### <a name="input_storage_account_access_key"></a> [storage\_account\_access\_key](#input\_storage\_account\_access\_key)
315371

316372
Description: The Storage Account Primary Access Key.
@@ -402,3 +458,5 @@ The following resources are used by this module:
402458

403459
- [azuread_group_member.function_app](https://registry.terraform.io/providers/hashicorp/azuread/latest/docs/resources/group_member) (resource)
404460
- [azurerm_linux_function_app.function_app](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/linux_function_app) (resource)
461+
- [azurerm_monitor_metric_alert.function_4xx](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_metric_alert) (resource)
462+
- [azurerm_monitor_metric_alert.function_5xx](https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/monitor_metric_alert) (resource)

0 commit comments

Comments
 (0)