|
1 |
| -# terraform datadog SQL Server monitoring |
2 | 1 |
|
3 |
| -## Getting Started |
| 2 | + |
4 | 3 |
|
5 |
| -Pre-commit: |
| 4 | +[//]: # (This file is generated. Do not edit, module description can be added by editing / creating module_description.md) |
| 5 | + |
| 6 | +# Terraform module for Datadog Sql Server |
| 7 | + |
| 8 | +This module requires the [sql server integration](https://docs.datadoghq.com/integrations/sqlserver/?tab=host) to be configured. |
| 9 | +It has basic SQL Server monitoring. Locks, process blocked, connectivity. |
| 10 | +It's best to also use Datadog's APM instrumentation to understand the way the application is using the database. |
| 11 | +There's an upcoming feature in Datadog to fully support deep dive database monitoring. |
| 12 | + |
| 13 | +This module is part of a larger suite of modules that provide alerts in Datadog. |
| 14 | +Other modules can be found on the [Terraform Registry](https://registry.terraform.io/search/modules?namespace=kabisa&provider=datadog) |
| 15 | + |
| 16 | +We have two base modules we use to standardise development of our Monitor Modules: |
| 17 | +- [generic monitor](https://github.com/kabisa/terraform-datadog-generic-monitor) Used in 90% of our alerts |
| 18 | +- [service check monitor](https://github.com/kabisa/terraform-datadog-service-check-monitor) |
| 19 | + |
| 20 | +Modules are generated with this tool: https://github.com/kabisa/datadog-terraform-generator |
| 21 | + |
| 22 | +# Example Usage |
| 23 | + |
| 24 | +```terraform |
| 25 | +module "sql_server" { |
| 26 | + source = "kabisa/sql-server/datadog" |
| 27 | +
|
| 28 | + notification_channel = "[email protected]" |
| 29 | + service = "SQL Server" |
| 30 | + env = "prd" |
| 31 | + alert_env = "prd" |
| 32 | + filter_str = "role:sqlserver" |
| 33 | + service_check_include_tags = ["role:sqlserver"] |
| 34 | +} |
| 35 | +``` |
| 36 | + |
| 37 | +Monitors: |
| 38 | +* [Terraform module for Datadog Sql Server](#terraform-module-for-datadog-sql-server) |
| 39 | + * [Connections](#connections) |
| 40 | + * [Page Life Expectancy](#page-life-expectancy) |
| 41 | + * [Can Connect](#can-connect) |
| 42 | + * [Buffer Cache Hit Ratio](#buffer-cache-hit-ratio) |
| 43 | + * [Database State](#database-state) |
| 44 | + * [Lock Waits](#lock-waits) |
| 45 | + * [Batches Compiled Percent](#batches-compiled-percent) |
| 46 | + * [Procs Blocked](#procs-blocked) |
| 47 | + * [Module Variables](#module-variables) |
| 48 | + |
| 49 | +# Getting started developing |
| 50 | +[pre-commit](http://pre-commit.com/) was used to do Terraform linting and validating. |
| 51 | + |
| 52 | +Steps: |
6 | 53 | - Install [pre-commit](http://pre-commit.com/). E.g. `brew install pre-commit`.
|
7 | 54 | - Run `pre-commit install` in this repo. (Every time you cloud a repo with pre-commit enabled you will need to run the pre-commit install command)
|
8 | 55 | - That’s it! Now every time you commit a code change (`.tf` file), the hooks in the `hooks:` config `.pre-commit-config.yaml` will execute.
|
| 56 | + |
| 57 | +## Connections |
| 58 | + |
| 59 | +Query: |
| 60 | +```terraform |
| 61 | +avg(last_30m):max:sqlserver.stats.connections{tag:xxx} by {host} >= 500 |
| 62 | +``` |
| 63 | + |
| 64 | +| variable | default | required | description | |
| 65 | +|-------------------------------|----------|----------|----------------------------------| |
| 66 | +| connections_enabled | True | No | | |
| 67 | +| connections_warning | 400 | No | | |
| 68 | +| connections_critical | 500 | No | | |
| 69 | +| connections_evaluation_period | last_30m | No | | |
| 70 | +| connections_note | "" | No | | |
| 71 | +| connections_docs | "" | No | | |
| 72 | +| connections_filter_override | "" | No | | |
| 73 | +| connections_alerting_enabled | True | No | | |
| 74 | +| connections_priority | 3 | No | Number from 1 (high) to 5 (low). | |
| 75 | + |
| 76 | + |
| 77 | +## Page Life Expectancy |
| 78 | + |
| 79 | +When this metric is low, pages are not being cached for a short time and often read from disk. Consider allocating more memory. |
| 80 | + |
| 81 | +Query: |
| 82 | +```terraform |
| 83 | +avg(last_1d):min:sqlserver.buffer.page_life_expectancy{tag:xxx} by {host} < 900 |
| 84 | +``` |
| 85 | + |
| 86 | +| variable | default | required | description | |
| 87 | +|----------------------------------------|------------------------------------------|----------|----------------------------------| |
| 88 | +| page_life_expectancy_enabled | True | No | | |
| 89 | +| page_life_expectancy_warning | 1800 | No | | |
| 90 | +| page_life_expectancy_critical | 900 | No | | |
| 91 | +| page_life_expectancy_evaluation_period | last_1d | No | | |
| 92 | +| page_life_expectancy_note | "" | No | | |
| 93 | +| page_life_expectancy_docs | When this metric is low, pages are not being cached for a short time and often read from disk. Consider allocating more memory. | No | | |
| 94 | +| page_life_expectancy_filter_override | "" | No | | |
| 95 | +| page_life_expectancy_alerting_enabled | True | No | | |
| 96 | +| page_life_expectancy_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 97 | + |
| 98 | + |
| 99 | +## Can Connect |
| 100 | + |
| 101 | +| variable | default | required | description | |
| 102 | +|------------------------------|----------|----------|--------------| |
| 103 | +| can_connect_enabled | True | No | | |
| 104 | +| can_connect_alerting_enabled | True | No | | |
| 105 | +| can_connect_warning | 1 | No | | |
| 106 | +| can_connect_critical | 1 | No | | |
| 107 | +| can_connect_priority | 1 | No | | |
| 108 | +| can_connect_docs | "" | No | | |
| 109 | +| can_connect_note | "" | No | | |
| 110 | + |
| 111 | + |
| 112 | +## Buffer Cache Hit Ratio |
| 113 | + |
| 114 | +When this metric is low, pages are often read from disk. Consider allocating more memory. |
| 115 | + |
| 116 | +Query: |
| 117 | +```terraform |
| 118 | +avg(last_1d):min:sqlserver.buffer.cache_hit_ratio{tag:xxx} by {host} * 100 < 75 |
| 119 | +``` |
| 120 | + |
| 121 | +| variable | default | required | description | |
| 122 | +|------------------------------------------|------------------------------------------|----------|----------------------------------| |
| 123 | +| buffer_cache_hit_ratio_enabled | True | No | | |
| 124 | +| buffer_cache_hit_ratio_warning | 90 | No | | |
| 125 | +| buffer_cache_hit_ratio_critical | 75 | No | | |
| 126 | +| buffer_cache_hit_ratio_evaluation_period | last_1d | No | | |
| 127 | +| buffer_cache_hit_ratio_note | "" | No | | |
| 128 | +| buffer_cache_hit_ratio_docs | When this metric is low, pages are often read from disk. Consider allocating more memory. | No | | |
| 129 | +| buffer_cache_hit_ratio_filter_override | "" | No | | |
| 130 | +| buffer_cache_hit_ratio_alerting_enabled | True | No | | |
| 131 | +| buffer_cache_hit_ratio_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 132 | + |
| 133 | + |
| 134 | +## Database State |
| 135 | + |
| 136 | +Query: |
| 137 | +```terraform |
| 138 | +max(last_5m):max:sqlserver.database.state{tag:xxx} by {host,database,database_state_desc} >= 5 |
| 139 | +``` |
| 140 | + |
| 141 | +| variable | default | required | description | |
| 142 | +|----------------------------------|----------|----------|----------------------------------| |
| 143 | +| database_state_enabled | True | No | | |
| 144 | +| database_state_warning | 1 | No | | |
| 145 | +| database_state_critical | 5 | No | | |
| 146 | +| database_state_evaluation_period | last_5m | No | | |
| 147 | +| database_state_note | "" | No | | |
| 148 | +| database_state_docs | "" | No | | |
| 149 | +| database_state_filter_override | "" | No | | |
| 150 | +| database_state_alerting_enabled | True | No | | |
| 151 | +| database_state_priority | 1 | No | Number from 1 (high) to 5 (low). | |
| 152 | + |
| 153 | + |
| 154 | +## Lock Waits |
| 155 | + |
| 156 | +High numbers of lock waits per second is caused by lock contention. Try reducing lock contention by using more fine grained locking in the queries. |
| 157 | + |
| 158 | +Query: |
| 159 | +```terraform |
| 160 | +avg(last_30m):max:sqlserver.stats.lock_waits{tag:xxx} by {host} > 20 |
| 161 | +``` |
| 162 | + |
| 163 | +| variable | default | required | description | |
| 164 | +|------------------------------|------------------------------------------|----------|----------------------------------| |
| 165 | +| lock_waits_enabled | True | No | | |
| 166 | +| lock_waits_warning | 10 | No | | |
| 167 | +| lock_waits_critical | 20 | No | | |
| 168 | +| lock_waits_evaluation_period | last_30m | No | | |
| 169 | +| lock_waits_note | "" | No | | |
| 170 | +| lock_waits_docs | High numbers of lock waits per second is caused by lock contention. Try reducing lock contention by using more fine grained locking in the queries. | No | | |
| 171 | +| lock_waits_filter_override | "" | No | | |
| 172 | +| lock_waits_alerting_enabled | True | No | | |
| 173 | +| lock_waits_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 174 | + |
| 175 | + |
| 176 | +## Batches Compiled Percent |
| 177 | + |
| 178 | +When this metric is high, a lot of queries need to be recompiled. Consider parameterizing more queries by using stored procedures, using forced parameterization or allocating more memory. |
| 179 | + |
| 180 | +Query: |
| 181 | +```terraform |
| 182 | +avg(last_1d):(max:sqlserver.stats.sql_compilations{tag:xxx} by {host} / max:sqlserver.stats.batch_requests{tag:xxx} by {host}) * 100 >= 20 |
| 183 | +``` |
| 184 | + |
| 185 | +| variable | default | required | description | |
| 186 | +|--------------------------------------------|------------------------------------------|----------|----------------------------------| |
| 187 | +| batches_compiled_percent_enabled | True | No | | |
| 188 | +| batches_compiled_percent_warning | 10 | No | | |
| 189 | +| batches_compiled_percent_critical | 20 | No | | |
| 190 | +| batches_compiled_percent_evaluation_period | last_1d | No | | |
| 191 | +| batches_compiled_percent_note | "" | No | | |
| 192 | +| batches_compiled_percent_docs | When this metric is high, a lot of queries need to be recompiled. Consider parameterizing more queries by using stored procedures, using forced parameterization or allocating more memory. | No | | |
| 193 | +| batches_compiled_percent_filter_override | "" | No | | |
| 194 | +| batches_compiled_percent_alerting_enabled | True | No | | |
| 195 | +| batches_compiled_percent_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 196 | + |
| 197 | + |
| 198 | +## Procs Blocked |
| 199 | + |
| 200 | +High number of procs blocked can indicate deadlocks. Check for deadlocks by investigating which queries are waiting for locks to be released. |
| 201 | + |
| 202 | +Query: |
| 203 | +```terraform |
| 204 | +avg(last_10m):max:sqlserver.stats.procs_blocked{tag:xxx} by {host} >= 1 |
| 205 | +``` |
| 206 | + |
| 207 | +| variable | default | required | description | |
| 208 | +|---------------------------------|------------------------------------------|----------|----------------------------------| |
| 209 | +| procs_blocked_enabled | True | No | | |
| 210 | +| procs_blocked_warning | None | No | | |
| 211 | +| procs_blocked_critical | 1 | No | | |
| 212 | +| procs_blocked_evaluation_period | last_10m | No | | |
| 213 | +| procs_blocked_note | "" | No | | |
| 214 | +| procs_blocked_docs | High number of procs blocked can indicate deadlocks. Check for deadlocks by investigating which queries are waiting for locks to be released. | No | | |
| 215 | +| procs_blocked_filter_override | "" | No | | |
| 216 | +| procs_blocked_alerting_enabled | True | No | | |
| 217 | +| procs_blocked_priority | 3 | No | Number from 1 (high) to 5 (low). | |
| 218 | + |
| 219 | + |
| 220 | +## Module Variables |
| 221 | + |
| 222 | +| variable | default | required | description | |
| 223 | +|----------------------------|----------|----------|--------------| |
| 224 | +| env | | Yes | | |
| 225 | +| alert_env | | Yes | | |
| 226 | +| filter_str | | Yes | | |
| 227 | +| service | | Yes | | |
| 228 | +| notification_channel | | Yes | | |
| 229 | +| additional_tags | [] | No | | |
| 230 | +| name_prefix | "" | No | | |
| 231 | +| name_suffix | "" | No | | |
| 232 | +| locked | True | No | | |
| 233 | +| service_check_include_tags | None | No | | |
| 234 | +| service_check_exclude_tags | None | No | | |
| 235 | + |
| 236 | + |
0 commit comments