Skip to content

Commit f8a2b31

Browse files
Add Windows Active Directory Mixin (#1105)
* initial * updated readme + alerts * checkpoint * mixin finished * added alerts * added lint * update readme * added alerts panel * Capitalization Changes * readme update * readme update * Extraneous replication panel removal * lint fix * stefan 2/n * Vitaly feedback 1/n * updated links * hide ad alerts behind feature flag * Update windows-mixin/config.libsonnet --------- Co-authored-by: v-zhuravlev <[email protected]>
1 parent 7f961aa commit f8a2b31

File tree

13 files changed

+824
-45
lines changed

13 files changed

+824
-45
lines changed

windows-active-directory-mixin/.lint

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
exclusions:
2+
template-job-rule:
3+
reason: "Prometheus datasource variable is being named as prometheus_datasource now while linter expects 'datasource'"
4+
panel-datasource-rule:
5+
reason: "Loki datasource variable is being named as loki_datasource now while linter expects 'datasource'"
6+
template-datasource-rule:
7+
reason: "Based on new convention we are using variable names prometheus_datasource and loki_datasource where as linter expects 'datasource'"
8+
template-instance-rule:
9+
reason: "Based on new convention we are using variable names prometheus_datasource and loki_datasource where as linter expects 'datasource'"
10+
target-job-rule:
11+
reason: "mixtool upgrade made this rule stricter. TODO: Fix errors and remove the warning exclusion"
12+
panel-title-description-rule:
13+
reason: "Not required for logs volume"
14+
panel-units-rule:
15+
reason: "Logs volume has no unit"
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
JSONNET_FMT := jsonnetfmt -n 2 --max-blank-lines 1 --string-style s --comment-style s
2+
3+
.PHONY: all
4+
all: build dashboards_out prometheus_alerts.yaml
5+
6+
vendor: jsonnetfile.json
7+
jb install
8+
9+
.PHONY: build
10+
build: vendor
11+
12+
.PHONY: fmt
13+
fmt:
14+
find . -name 'vendor' -prune -o -name '*.libsonnet' -print -o -name '*.jsonnet' -print | \
15+
xargs -n 1 -- $(JSONNET_FMT) -i
16+
17+
.PHONY: lint
18+
lint: build
19+
find . -name 'vendor' -prune -o -name '*.libsonnet' -print -o -name '*.jsonnet' -print | \
20+
while read f; do \
21+
$(JSONNET_FMT) "$$f" | diff -u "$$f" -; \
22+
done
23+
mixtool lint mixin.libsonnet
24+
25+
dashboards_out: mixin.libsonnet config.libsonnet $(wildcard dashboards/*)
26+
@mkdir -p dashboards_out
27+
mixtool generate dashboards mixin.libsonnet -d dashboards_out
28+
29+
prometheus_alerts.yaml: mixin.libsonnet alerts/*.libsonnet
30+
mixtool generate alerts mixin.libsonnet -a prometheus_alerts.yaml
31+
32+
.PHONY: clean
33+
clean:
34+
rm -rf dashboards_out prometheus_alerts.yaml
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Windows Active Directory mixin
2+
The Windows Active Directory mixin is a set of configurable Grafana dashboards and alerts.
3+
4+
The Windows Active Directory mixin contains the following dashboards:
5+
6+
- Windows Active Directory overview
7+
- Windows logs
8+
9+
and the following alerts:
10+
11+
- WindowsActiveDirectoryHighPendingReplicationOperations
12+
- WindowsActiveDirectoryHighReplicationSyncRequestFailures
13+
- WindowsActiveDirectoryHighPasswordChange
14+
- WindowsActiveDirectoryMetricsDown
15+
16+
## Windows Active Directory overview
17+
The Windows Active Directory overview dashboard provides details on alerts, LDAP operations and requests, bind operations, replication traffic, and database operations.
18+
![Windows Active Directory overview dashboard (LDAP)](https://storage.googleapis.com/grafanalabs-integration-assets/windows-active-directory/screenshots/windows_active_directory_overview_1.png)
19+
![Windows Active Directory overview dashboard (database)](https://storage.googleapis.com/grafanalabs-integration-assets/windows-active-directory/screenshots/window_active_directory_overview_2.png)
20+
21+
# Windows logs
22+
The Windows logs dashboard provides details on incoming Windows application, security, and system logs.
23+
![Windows logs dashboard](https://storage.googleapis.com/grafanalabs-integration-assets/windows-active-directory/screenshots/windows_active_directory_logs.png)
24+
25+
Windows logs are enabled by default in the `config.libsonnet` and can be disabled by setting `enableLokiLogs` to `false`. Then run `make` again to regenerate the dashboard:
26+
27+
```
28+
{
29+
_config+:: {
30+
enableLokiLogs: false,
31+
},
32+
}
33+
```
34+
35+
For the selectors to properly work with the Windows logs ingested into your logs datasource, please also include the matching `instance` and `job` labels in the [scrape configs](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#scrape_configs) to match the labels for ingested metrics.
36+
37+
```yaml
38+
- job_name: integrations/windows-exporter-application
39+
windows_events:
40+
use_incoming_timestamp: true
41+
eventlog_name: 'Application'
42+
bookmark_path: "./bookmarks-app.xml"
43+
xpath_query: '*'
44+
locale: 1033
45+
labels:
46+
job: integrations/windows_exporter
47+
instance: '<your-instance-name>' # must match instance used in windows_exporter
48+
relabel_configs:
49+
- source_labels: ['computer']
50+
target_label: 'agent_hostname'
51+
pipeline_stages:
52+
- json:
53+
expressions:
54+
source: source
55+
level: levelText
56+
- labels:
57+
source:
58+
level:
59+
- job_name: integrations/windows-exporter-system
60+
windows_events:
61+
use_incoming_timestamp: true
62+
bookmark_path: "./bookmarks-sys.xml"
63+
eventlog_name: "System"
64+
xpath_query: '*'
65+
locale: 1033
66+
# - 1033 to force English language
67+
# - 0 to use default Windows locale
68+
labels:
69+
job: integrations/windows_exporter
70+
instance: '<your-instance-name>' # must match instance used in windows_exporter
71+
relabel_configs:
72+
- source_labels: ['computer']
73+
target_label: 'agent_hostname'
74+
pipeline_stages:
75+
- json:
76+
expressions:
77+
source: source
78+
level: levelText
79+
- labels:
80+
source:
81+
level:
82+
- job_name: integrations/windows-exporter-security
83+
windows_events:
84+
use_incoming_timestamp: true
85+
bookmark_path: "./bookmarks-sys.xml"
86+
eventlog_name: "Security"
87+
xpath_query: '*'
88+
locale: 1033
89+
# - 1033 to force English language
90+
# - 0 to use default Windows locale
91+
labels:
92+
job: integrations/windows_exporter
93+
instance: '<your-instance-name>' # must match instance used in windows_exporter
94+
relabel_configs:
95+
- source_labels: ['computer']
96+
target_label: 'agent_hostname'
97+
pipeline_stages:
98+
- json:
99+
expressions:
100+
source: source
101+
level: levelText
102+
- labels:
103+
source:
104+
level:
105+
106+
```
107+
108+
## Alerts overview
109+
- WindowsActiveDirectoryHighPendingReplicationOperations: There is a high number of pending replication operations in Active Directory. A high number of pending operations sustained over a period of time can indicate a problem with replication.
110+
- WindowsActiveDirectoryHighReplicationSyncRequestFailures: There are a number of replication synchronization request failures. These can cause authentication failures, outdated information being propagated across domain controllers, and potentially data loss or inconsistencies.'
111+
- WindowsActiveDirectoryHighPasswordChange: There is a high number of password changes. This may indicate unauthorized changes or attacks.
112+
- WindowsActiveDirectoryMetricsDown: Windows Active Directory metrics are down.
113+
114+
Default thresholds can be configured in `config.libsonnet`.
115+
116+
```js
117+
{
118+
_configs+:: {
119+
// alerts thresholds
120+
alertsHighPendingReplicationOperations: 50, // count
121+
alertsHighReplicationSyncRequestFailures: 0, // count
122+
alertsHighPasswordChanges: 25, //count
123+
alertsMetricsDownJobName: 'integrations/windows',
124+
}
125+
}
126+
```
127+
128+
## Install tools
129+
```bash
130+
go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest
131+
go install github.com/monitoring-mixins/mixtool/cmd/mixtool@latest
132+
```
133+
134+
For linting and formatting, you would also need `jsonnetfmt` installed. If you
135+
have a working Go development environment, it's easiest to run the following:
136+
137+
```bash
138+
go install github.com/google/go-jsonnet/cmd/jsonnetfmt@latest
139+
```
140+
141+
The files in `dashboards_out` need to be imported
142+
into your Grafana server. The exact details will be depending on your environment.
143+
144+
`prometheus_alerts.yaml` needs to be imported into Prometheus.
145+
146+
## Generate dashboards and alerts
147+
Edit `config.libsonnet` if required and then build JSON dashboard files for Grafana:
148+
149+
```bash
150+
make
151+
```
152+
153+
For more advanced uses of mixins, see
154+
https://github.com/monitoring-mixins/docs.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
import 'github.com/grafana/grafonnet/gen/grafonnet-v10.0.0/main.libsonnet'
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{
2+
"version": 1,
3+
"dependencies": [
4+
{
5+
"source": {
6+
"git": {
7+
"remote": "https://github.com/grafana/jsonnet-libs.git",
8+
"subdir": "windows-observ-lib"
9+
}
10+
},
11+
"version": "master"
12+
}
13+
],
14+
"legacyImports": true
15+
}
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
local windowsobservlib = import '../windows-observ-lib/main.libsonnet';
2+
local alerts = import './alerts/alerts.libsonnet';
3+
local g = import './g.libsonnet';
4+
local var = g.dashboard.variable;
5+
local activedirectorymixin =
6+
windowsobservlib.new(
7+
filteringSelector='job=~"integrations/windows_exporter"',
8+
uid='active-directory',
9+
groupLabels=['job'],
10+
instanceLabels=['instance'],
11+
)
12+
13+
{
14+
config+: {
15+
enableADDashboard: true,
16+
},
17+
}
18+
19+
{
20+
grafana+: {
21+
local link = g.dashboard.link,
22+
links: {
23+
otherDashboards:
24+
link.dashboards.new('All Windows Active Directory dashboards', activedirectorymixin.config.dashboardTags)
25+
+ link.dashboards.options.withIncludeVars(true)
26+
+ link.dashboards.options.withKeepTime(true)
27+
+ link.dashboards.options.withAsDropdown(true),
28+
},
29+
variables+: {
30+
datasources+: {
31+
loki+: var.datasource.withRegex('Loki|.+logs'),
32+
prometheus+: var.datasource.withRegex('Prometheus|Cortex|Mimir|grafanacloud-.+-prom'),
33+
},
34+
},
35+
36+
},
37+
};
38+
39+
local activedirectorydashboards = ['activedirectory', 'logs'];
40+
local selectedDashboards = {
41+
[key]: activedirectorymixin.grafana.dashboards[key]
42+
for key in activedirectorydashboards
43+
if key in activedirectorymixin.grafana.dashboards
44+
};
45+
46+
{
47+
grafanaDashboards+:: selectedDashboards,
48+
prometheusAlerts+:: activedirectorymixin.prometheus.alerts,
49+
prometheusRules+:: activedirectorymixin.prometheus.recordingRules,
50+
}

windows-mixin/config.libsonnet

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
alertsCPUThresholdWarning: '90',
1919
alertMemoryUsageThresholdCritical: '90',
2020
alertDiskUsageThresholdCritical: '90',
21+
enableADDashboard: false,
2122
// set to false to disable logs dashboard and logs annotations
2223
enableLokiLogs: true,
2324
},

windows-observ-lib/alerts.libsonnet

Lines changed: 71 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,75 @@
11
{
22
new(this): {
3-
3+
local ADAlerts = [
4+
{
5+
alert: 'WindowsActiveDirectoryHighPendingReplicationOperations',
6+
expr: |||
7+
windows_ad_replication_pending_operations{%(filteringSelector)s} >= %(alertsHighPendingReplicationOperations)s
8+
||| % this.config,
9+
'for': '10m',
10+
labels: {
11+
severity: 'warning',
12+
},
13+
annotations: {
14+
summary: 'There is a high number of pending replication operations in Active Directory. A high number of pending operations sustained over a period of time can indicate a problem with replication.',
15+
description:
16+
(
17+
'The number of pending replication operations on {{$labels.instance}} is {{ printf "%%.0f" $value }} which is above the threshold of %(alertsHighPendingReplicationOperations)s.'
18+
) % this.config,
19+
},
20+
},
21+
{
22+
alert: 'WindowsActiveDirectoryHighReplicationSyncRequestFailures',
23+
expr: |||
24+
increase(windows_ad_replication_sync_requests_schema_mismatch_failure_total{%(filteringSelector)s}[5m]) > %(alertsHighReplicationSyncRequestFailures)s
25+
||| % this.config,
26+
'for': '5m',
27+
labels: {
28+
severity: 'critical',
29+
},
30+
annotations: {
31+
summary: 'There are a number of replication synchronization request failures. These can cause authentication failures, outdated information being propagated across domain controllers, and potentially data loss or inconsistencies.',
32+
description:
33+
(
34+
'The number of replication sync request failures on {{$labels.instance}} is {{ printf "%%.0f" $value }} which is above the threshold of %(alertsHighReplicationSyncRequestFailures)s.'
35+
) % this.config,
36+
},
37+
},
38+
{
39+
alert: 'WindowsActiveDirectoryHighPasswordChanges',
40+
expr: |||
41+
increase(windows_ad_sam_password_changes_total{%(filteringSelector)s}[5m]) > %(alertsHighPasswordChanges)s
42+
||| % this.config,
43+
'for': '5m',
44+
labels: {
45+
severity: 'warning',
46+
},
47+
annotations: {
48+
summary: 'There is a high number of password changes. This may indicate unauthorized changes or attacks.',
49+
description:
50+
(
51+
'The number of password changes on {{$labels.instance}} is {{ printf "%%.0f" $value }} which is greater than the threshold of %(alertsHighPasswordChanges)s'
52+
) % this.config,
53+
},
54+
},
55+
{
56+
alert: 'WindowsActiveDirectoryMetricsDown',
57+
expr: |||
58+
up{job="%(alertsMetricsDownJobName)s"} == 0
59+
||| % this.config,
60+
'for': '5m',
61+
labels: {
62+
severity: 'critical',
63+
},
64+
annotations: {
65+
summary: 'Windows Active Directory metrics are down.',
66+
description:
67+
(
68+
'There are no available metrics for Windows Active Directory integration from instance {{$labels.instance}}.'
69+
) % this.config,
70+
},
71+
},
72+
],
473
groups: [
574
{
675
name: 'windows-alerts-' + this.config.uid,
@@ -120,7 +189,7 @@
120189
||| % this.config,
121190
},
122191
},
123-
],
192+
] + if this.config.enableADDashboard then ADAlerts else [],
124193
},
125194
],
126195
},

0 commit comments

Comments
 (0)