|
1 |
| -# terraform datadog SQL Server monitoring |
2 | 1 |
|
3 |
| -## Getting Started |
| 2 | + |
4 | 3 |
|
5 |
| -Pre-commit: |
| 4 | +[//]: # (This file is generated. Do not edit, module description can be added by editing / creating module_description.md) |
| 5 | + |
| 6 | +# Terraform module for Datadog Sql Server |
| 7 | + |
| 8 | +This module requires the [sql server integration](https://docs.datadoghq.com/integrations/sqlserver/?tab=host) to be configured. |
| 9 | +It has basic SQL Server monitoring. Locks, process blocked, connectivity. |
| 10 | +It's best to also use Datadog's APM instrumentation to understand the way the application is using the database. |
| 11 | +There's an upcoming feature in Datadog to fully support deep dive database monitoring. |
| 12 | + |
| 13 | +Monitors: |
| 14 | +* [Terraform module for Datadog Sql Server](#terraform-module-for-datadog-sql-server) |
| 15 | + * [Connections](#connections) |
| 16 | + * [Page Life Expectancy](#page-life-expectancy) |
| 17 | + * [Can Connect](#can-connect) |
| 18 | + * [Buffer Cache Hit Ratio](#buffer-cache-hit-ratio) |
| 19 | + * [Database State](#database-state) |
| 20 | + * [Lock Waits](#lock-waits) |
| 21 | + * [Batches Compiled Percent](#batches-compiled-percent) |
| 22 | + * [Procs Blocked](#procs-blocked) |
| 23 | + * [Module Variables](#module-variables) |
| 24 | + |
| 25 | +# Getting started developing |
| 26 | +[pre-commit](http://pre-commit.com/) was used to do Terraform linting and validating. |
| 27 | + |
| 28 | +Steps: |
6 | 29 | - Install [pre-commit](http://pre-commit.com/). E.g. `brew install pre-commit`.
|
7 | 30 | - Run `pre-commit install` in this repo. (Every time you cloud a repo with pre-commit enabled you will need to run the pre-commit install command)
|
8 | 31 | - That’s it! Now every time you commit a code change (`.tf` file), the hooks in the `hooks:` config `.pre-commit-config.yaml` will execute.
|
| 32 | + |
| 33 | +## Connections |
| 34 | + |
| 35 | +Query: |
| 36 | +```terraform |
| 37 | +avg(last_30m):max:sqlserver.stats.connections{tag:xxx} by {host} >= 500 |
| 38 | +``` |
| 39 | + |
| 40 | +| variable | default | required | description | |
| 41 | +|-------------------------------|----------|----------|----------------------------------| |
| 42 | +| connections_enabled | True | No | | |
| 43 | +| connections_warning | 400 | No | | |
| 44 | +| connections_critical | 500 | No | | |
| 45 | +| connections_evaluation_period | last_30m | No | | |
| 46 | +| connections_note | "" | No | | |
| 47 | +| connections_docs | "" | No | | |
| 48 | +| connections_filter_override | "" | No | | |
| 49 | +| connections_alerting_enabled | True | No | | |
| 50 | +| connections_priority | 3 | No | Number from 1 (high) to 5 (low). | |
| 51 | + |
| 52 | + |
| 53 | +## Page Life Expectancy |
| 54 | + |
| 55 | +When this metric is low, pages are not being cached for a short time and often read from disk. Consider allocating more memory. |
| 56 | + |
| 57 | +Query: |
| 58 | +```terraform |
| 59 | +avg(last_1d):min:sqlserver.buffer.page_life_expectancy{tag:xxx} by {host} < 900 |
| 60 | +``` |
| 61 | + |
| 62 | +| variable | default | required | description | |
| 63 | +|----------------------------------------|------------------------------------------|----------|----------------------------------| |
| 64 | +| page_life_expectancy_enabled | True | No | | |
| 65 | +| page_life_expectancy_warning | 1800 | No | | |
| 66 | +| page_life_expectancy_critical | 900 | No | | |
| 67 | +| page_life_expectancy_evaluation_period | last_1d | No | | |
| 68 | +| page_life_expectancy_note | "" | No | | |
| 69 | +| page_life_expectancy_docs | When this metric is low, pages are not being cached for a short time and often read from disk. Consider allocating more memory. | No | | |
| 70 | +| page_life_expectancy_filter_override | "" | No | | |
| 71 | +| page_life_expectancy_alerting_enabled | True | No | | |
| 72 | +| page_life_expectancy_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 73 | + |
| 74 | + |
| 75 | +## Can Connect |
| 76 | + |
| 77 | +| variable | default | required | description | |
| 78 | +|------------------------------|----------|----------|--------------| |
| 79 | +| can_connect_enabled | True | No | | |
| 80 | +| can_connect_alerting_enabled | True | No | | |
| 81 | +| can_connect_warning | 1 | No | | |
| 82 | +| can_connect_critical | 1 | No | | |
| 83 | +| can_connect_priority | 1 | No | | |
| 84 | +| can_connect_docs | "" | No | | |
| 85 | +| can_connect_note | "" | No | | |
| 86 | + |
| 87 | + |
| 88 | +## Buffer Cache Hit Ratio |
| 89 | + |
| 90 | +When this metric is low, pages are often read from disk. Consider allocating more memory. |
| 91 | + |
| 92 | +Query: |
| 93 | +```terraform |
| 94 | +avg(last_1d):min:sqlserver.buffer.cache_hit_ratio{tag:xxx} by {host} * 100 < 75 |
| 95 | +``` |
| 96 | + |
| 97 | +| variable | default | required | description | |
| 98 | +|------------------------------------------|------------------------------------------|----------|----------------------------------| |
| 99 | +| buffer_cache_hit_ratio_enabled | True | No | | |
| 100 | +| buffer_cache_hit_ratio_warning | 90 | No | | |
| 101 | +| buffer_cache_hit_ratio_critical | 75 | No | | |
| 102 | +| buffer_cache_hit_ratio_evaluation_period | last_1d | No | | |
| 103 | +| buffer_cache_hit_ratio_note | "" | No | | |
| 104 | +| buffer_cache_hit_ratio_docs | When this metric is low, pages are often read from disk. Consider allocating more memory. | No | | |
| 105 | +| buffer_cache_hit_ratio_filter_override | "" | No | | |
| 106 | +| buffer_cache_hit_ratio_alerting_enabled | True | No | | |
| 107 | +| buffer_cache_hit_ratio_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 108 | + |
| 109 | + |
| 110 | +## Database State |
| 111 | + |
| 112 | +Query: |
| 113 | +```terraform |
| 114 | +max(last_5m):max:sqlserver.database.state{tag:xxx} by {host,database,database_state_desc} >= 5 |
| 115 | +``` |
| 116 | + |
| 117 | +| variable | default | required | description | |
| 118 | +|----------------------------------|----------|----------|----------------------------------| |
| 119 | +| database_state_enabled | True | No | | |
| 120 | +| database_state_warning | 1 | No | | |
| 121 | +| database_state_critical | 5 | No | | |
| 122 | +| database_state_evaluation_period | last_5m | No | | |
| 123 | +| database_state_note | "" | No | | |
| 124 | +| database_state_docs | "" | No | | |
| 125 | +| database_state_filter_override | "" | No | | |
| 126 | +| database_state_alerting_enabled | True | No | | |
| 127 | +| database_state_priority | 1 | No | Number from 1 (high) to 5 (low). | |
| 128 | + |
| 129 | + |
| 130 | +## Lock Waits |
| 131 | + |
| 132 | +High numbers of lock waits per second is caused by lock contention. Try reducing lock contention by using more fine grained locking in the queries. |
| 133 | + |
| 134 | +Query: |
| 135 | +```terraform |
| 136 | +avg(last_30m):max:sqlserver.stats.lock_waits{tag:xxx} by {host} > 20 |
| 137 | +``` |
| 138 | + |
| 139 | +| variable | default | required | description | |
| 140 | +|------------------------------|------------------------------------------|----------|----------------------------------| |
| 141 | +| lock_waits_enabled | True | No | | |
| 142 | +| lock_waits_warning | 10 | No | | |
| 143 | +| lock_waits_critical | 20 | No | | |
| 144 | +| lock_waits_evaluation_period | last_30m | No | | |
| 145 | +| lock_waits_note | "" | No | | |
| 146 | +| lock_waits_docs | High numbers of lock waits per second is caused by lock contention. Try reducing lock contention by using more fine grained locking in the queries. | No | | |
| 147 | +| lock_waits_filter_override | "" | No | | |
| 148 | +| lock_waits_alerting_enabled | True | No | | |
| 149 | +| lock_waits_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 150 | + |
| 151 | + |
| 152 | +## Batches Compiled Percent |
| 153 | + |
| 154 | +When this metric is high, a lot of queries need to be recompiled. Consider parameterizing more queries by using stored procedures, using forced parameterization or allocating more memory. |
| 155 | + |
| 156 | +Query: |
| 157 | +```terraform |
| 158 | +avg(last_1d):(max:sqlserver.stats.sql_compilations{tag:xxx} by {host} / max:sqlserver.stats.batch_requests{tag:xxx} by {host}) * 100 >= 20 |
| 159 | +``` |
| 160 | + |
| 161 | +| variable | default | required | description | |
| 162 | +|--------------------------------------------|------------------------------------------|----------|----------------------------------| |
| 163 | +| batches_compiled_percent_enabled | True | No | | |
| 164 | +| batches_compiled_percent_warning | 10 | No | | |
| 165 | +| batches_compiled_percent_critical | 20 | No | | |
| 166 | +| batches_compiled_percent_evaluation_period | last_1d | No | | |
| 167 | +| batches_compiled_percent_note | "" | No | | |
| 168 | +| batches_compiled_percent_docs | When this metric is high, a lot of queries need to be recompiled. Consider parameterizing more queries by using stored procedures, using forced parameterization or allocating more memory. | No | | |
| 169 | +| batches_compiled_percent_filter_override | "" | No | | |
| 170 | +| batches_compiled_percent_alerting_enabled | True | No | | |
| 171 | +| batches_compiled_percent_priority | 4 | No | Number from 1 (high) to 5 (low). | |
| 172 | + |
| 173 | + |
| 174 | +## Procs Blocked |
| 175 | + |
| 176 | +High number of procs blocked can indicate deadlocks. Check for deadlocks by investigating which queries are waiting for locks to be released. |
| 177 | + |
| 178 | +Query: |
| 179 | +```terraform |
| 180 | +avg(last_10m):max:sqlserver.stats.procs_blocked{tag:xxx} by {host} >= 1 |
| 181 | +``` |
| 182 | + |
| 183 | +| variable | default | required | description | |
| 184 | +|---------------------------------|------------------------------------------|----------|----------------------------------| |
| 185 | +| procs_blocked_enabled | True | No | | |
| 186 | +| procs_blocked_warning | None | No | | |
| 187 | +| procs_blocked_critical | 1 | No | | |
| 188 | +| procs_blocked_evaluation_period | last_10m | No | | |
| 189 | +| procs_blocked_note | "" | No | | |
| 190 | +| procs_blocked_docs | High number of procs blocked can indicate deadlocks. Check for deadlocks by investigating which queries are waiting for locks to be released. | No | | |
| 191 | +| procs_blocked_filter_override | "" | No | | |
| 192 | +| procs_blocked_alerting_enabled | True | No | | |
| 193 | +| procs_blocked_priority | 3 | No | Number from 1 (high) to 5 (low). | |
| 194 | + |
| 195 | + |
| 196 | +## Module Variables |
| 197 | + |
| 198 | +| variable | default | required | description | |
| 199 | +|----------------------------|----------|----------|--------------| |
| 200 | +| env | | Yes | | |
| 201 | +| alert_env | | Yes | | |
| 202 | +| filter_str | | Yes | | |
| 203 | +| service | | Yes | | |
| 204 | +| notification_channel | | Yes | | |
| 205 | +| additional_tags | [] | No | | |
| 206 | +| name_prefix | "" | No | | |
| 207 | +| name_suffix | "" | No | | |
| 208 | +| locked | True | No | | |
| 209 | +| service_check_include_tags | None | No | | |
| 210 | +| service_check_exclude_tags | None | No | | |
| 211 | + |
| 212 | + |
0 commit comments