22
33## Overview
44
5- The ` configurations ` role configures alerts, integrations, and monitoring settings for your AxonOps deployment. This role manages metric alerts, backup configurations, service checks, integration with notification services (Slack, PagerDuty), log alerts, and custom dashboards.
5+ The ` configurations ` role configures alerts, integrations, and monitoring settings for your AxonOps deployment. This
6+ role manages metric alerts, backup configurations, service checks, integration with notification services (Slack,
7+ PagerDuty), log alerts, and custom dashboards.
68
79## Requirements
810
@@ -14,20 +16,20 @@ The `configurations` role configures alerts, integrations, and monitoring settin
1416
1517### Required Variables
1618
17- | Variable | Description | Example |
18- | ----------| -------------| ---------|
19- | ` org ` | Organization name in AxonOps | ` mycompany ` |
19+ | Variable | Description | Example |
20+ | ----------- | -------------------------------------- | ------------- ---------|
21+ | ` org ` | Organization name in AxonOps | ` mycompany ` |
2022| ` cluster ` | Cluster name to configure alerts for | ` production-cluster ` |
2123
2224** Note** : These variables can also be set via environment variables ` AXONOPS_ORG ` and ` AXONOPS_CLUSTER ` .
2325
2426### Optional Feature Flags
2527
26- | Variable | Description | Default |
27- | ----------| -------------| ---------|
28- | ` adaptive_repair ` | Configuration for adaptive repair settings | undefined |
29- | ` agent_disconnection_tolerance ` | Agent disconnection tolerance settings | undefined |
30- | ` human_readableid ` | Human-readable ID configuration | undefined |
28+ | Variable | Description | Default |
29+ | --------------------------------- | -------------------------------------------- | -- ---------|
30+ | ` adaptive_repair ` | Configuration for adaptive repair settings | undefined |
31+ | ` agent_disconnection_tolerance ` | Agent disconnection tolerance settings | undefined |
32+ | ` human_readableid ` | Human-readable ID configuration | undefined |
3133
3234## Dependencies
3335
@@ -124,6 +126,7 @@ This role requires a running AxonOps Server with API access.
124126 roles :
125127 - role : axonops.axonops.configurations
126128` ` `
129+
127130## Details playbook
128131
129132### Adaptive Repair Configuration
@@ -132,7 +135,9 @@ The Adaptive Repair feature can be configured by setting the `adaptive_repair` v
132135no need for files in the `config` directory.
133136
134137This allows you to enable or disable adaptive repair settings for your cluster.
138+
135139# ### List of Parameters
140+
136141| Parameter | Description | Type | Default |
137142|-----------------------|----------------------------------------------------------------------------------|---------|---------|
138143| `enabled` | Enable or disable adaptive repair | boolean | `true` |
@@ -175,7 +180,9 @@ This allows you to enable or disable adaptive repair settings for your cluster.
175180` ` `
176181
177182# ### Set GC Grace Threshold
178- Set the GC grace period. AxonOps will ignore tables that have a `gc_grace_seconds` value lower than the specified threshold.
183+
184+ Set the GC grace period. AxonOps will ignore tables that have a `gc_grace_seconds` value lower than the specified
185+ threshold.
179186The default is `86400` seconds (1 day).
180187
181188` ` ` yaml
@@ -195,6 +202,7 @@ The default is `86400` seconds (1 day).
195202# ### Set Table Parallelism
196203
197204It is suggested to keep this value at least as the number of table in the cluster.
205+
198206` ` ` yaml
199207- name: Set Table Parallelism for Adaptive Repair
200208 hosts: localhost
@@ -218,8 +226,8 @@ It is suggested to keep this value at least as the number of table in the cluste
218226 org: mycompany
219227 cluster: production-cluster
220228 adaptive_repair:
221- enabled: true
222- segmentretries: 10
229+ enabled: true
230+ segmentretries: 10
223231
224232 roles:
225233 - role: axonops.axonops.configurations
@@ -228,6 +236,7 @@ It is suggested to keep this value at least as the number of table in the cluste
228236# ### Set Segment Target Size
229237
230238Number from 16 to 10240
239+
231240` ` ` yaml
232241- name: Set Segment Target Size for Adaptive Repair
233242 hosts: localhost
@@ -243,6 +252,7 @@ Number from 16 to 10240
243252` ` `
244253
245254# ### Exclude Tables from Adaptive Repair
255+
246256List of tables to exclude from adaptive repair. The accepted format is a list of strings in the form "keyspace.table".
247257To exclude an entire keyspace, use "keyspace.*".
248258The default is an empty list.
@@ -256,15 +266,16 @@ The default is an empty list.
256266 adaptive_repair:
257267 enabled: true
258268 excludedtables:
259- - "system.peers"
260- - "system.local"
269+ - "system.peers"
270+ - "system.local"
261271
262272
263273 roles:
264274 - role: axonops.axonops.configurations
265275` ` `
266276
267277# ### Set Maximum Segments per Table
278+
268279Set the maximum number of segments per table to repair in a single repair cycle.
269280Having too many segments in a table causes too many repair commands to be sent.
270281
@@ -283,6 +294,7 @@ Having too many segments in a table causes too many repair commands to be sent.
283294` ` `
284295
285296# ### Set Segment Timeout
297+
286298Set the timeout in seconds for each segment repair operation.
287299Integer number followed by one of "s, m, h, d, w, M, y"
288300
@@ -299,25 +311,106 @@ Integer number followed by one of "s, m, h, d, w, M, y"
299311 roles:
300312 - role: axonops.axonops.configurations
301313` ` `
314+
315+ # ## Service Checks
316+
317+ Service checks can be configured by providing YAML a file called `service_checks.yml` in the directory
318+ ` config/[YOUR_ORG_NAME]`
319+ to make them available for all clusters in the organization, or in `config/[YOUR_ORG_NAME]/[YOUR_CLUSTER_NAME]` to make
320+ them available for a specific cluster.
321+
322+ The file is optional, if the file is not provided, no service checks will be configured.
323+
324+ The format of the file is as follows :
325+
326+ ` ` ` yaml
327+ axonops_shell_check: [ ]
328+
329+ axonops_tcp_check: [ ]
330+ ` ` `
331+
332+ both `axonops_shell_check` and `axonops_tcp_checks` are optionals.
333+
334+ # ### list of parameters for axonops_shell_check
335+
336+ | Parameter | Description | Type | Default |
337+ |------------|---------------------------------------|---------|-------------|
338+ | `name` | Name of the shell check | String | |
339+ | `present` | Whether the check is present or not | Boolean | True |
340+ | `interval` | How much ofthen the check need to run | String | |
341+ | `timeout` | Timeout for the check | String | |
342+ | `shell` | Shell used by the script | String | '/bin/bash' |
343+ | `script` | Script of the check | String | |
344+
345+ List of outcome codes for shell checks :
346+
347+ - `0` : OK
348+ - `1` : WARNING
349+ - `2` : CRITICAL
350+
351+ # ### Dummy example of axonops_shell_check
352+
353+ This is example of a dummy shell check that always returns CRITICAL :
354+
355+ ` ` ` yaml
356+ axonops_shell_check:
357+ - name: "Dummy check"
358+ present: true
359+ interval: "5m"
360+ timeout: "10s"
361+ script: |
362+ #!/bin/bash
363+ echo "This is a dummy check"
364+ exit 2"
365+ ` ` `
366+
367+ # ### Example of a shell check to monitor if a Debian/Ubuntu host needs a reboot
368+
369+ This check looks for the presence of the file `/var/run/reboot-required`, which is created by the system when a reboot
370+ is needed after package installations or updates.
371+
372+ ` ` ` yaml
373+ axonops_shell_check:
374+ - name: Debian / Ubuntu - Check host needs reboot
375+ interval: 12h
376+ present: true
377+ timeout: 1m
378+ script: |-
379+ set -euo pipefail
380+
381+ if [ -f /var/run/reboot-required ]
382+ then
383+ echo ` hostname` Reboot required
384+ exit 1
385+ else
386+ echo "Nothing to do"
387+ fi
388+ ```
389+
390+ ** Note:** More examples of service checks can be found in the org level
391+ [ service_checks.yml] ( ../../examples/configurations/config/REPLACE_WITH_ORG_NAME/service_checks.yml ) or the cluster level
392+ [ service_checks.yml] ( ../../examples/configurations/config/REPLACE_WITH_ORG_NAME/REPLACE_WITH_CLUSTER_NAME/service_checks.yml )
393+ example files.
394+
302395## Available Tags
303396
304397The role supports granular control through the following tags:
305398
306- | Tag | Description |
307- |-----| -------------|
308- | `metrics` | Configure metric alerts |
309- | `backups` | Configure backup settings |
310- | `service_checks` | Configure service check alerts |
311- | `slack` | Configure Slack integration |
312- | `pagerduty_integration` | Configure PagerDuty integration |
313- | `adaptive_repair` | Configure adaptive repair settings |
399+ | Tag | Description |
400+ | --------------------------------- | ---------------------------- -------------|
401+ | ` metrics ` | Configure metric alerts |
402+ | ` backups ` | Configure backup settings |
403+ | ` service_checks ` | Configure service check alerts |
404+ | ` slack ` | Configure Slack integration |
405+ | ` pagerduty_integration ` | Configure PagerDuty integration |
406+ | ` adaptive_repair ` | Configure adaptive repair settings |
314407| ` agent_disconnection_tolerance ` | Configure agent disconnection tolerance |
315- | `commitlogs_archive` | Configure commit log archiving |
316- | `human_readableid` | Configure human-readable IDs |
317- | `log_alerts` | Configure log-based alerts |
318- | `logcollector` | Configure log collector |
319- | `dashboards` | Import custom dashboards |
320- | `routes` | Configure alert routing rules |
408+ | ` commitlogs_archive ` | Configure commit log archiving |
409+ | ` human_readableid ` | Configure human-readable IDs |
410+ | ` log_alerts ` | Configure log-based alerts |
411+ | ` logcollector ` | Configure log collector |
412+ | ` dashboards ` | Import custom dashboards |
413+ | ` routes ` | Configure alert routing rules |
321414
322415## Tasks Overview
323416
@@ -359,11 +452,13 @@ The role performs the following tasks based on the enabled tags:
359452- **API Access**: Ensure you have proper API credentials configured for the AxonOps Server
360453- **Organization and Cluster**: The ` org` and `cluster` variables must match existing entries in your AxonOps deployment
361454- **Idempotency**: The role is designed to be idempotent and can be run multiple times safely
362- - **Configuration Files**: Alert definitions can be customized by providing your own configuration files in the appropriate directories
455+ - **Configuration Files**: Alert definitions can be customized by providing your own configuration files in the
456+ appropriate directories
363457
364458# # Additional Resources
365459
366460For more information about AxonOps alerts and configuration, see :
461+
367462- [ALERTS.md](../../ALERTS.md) in the repository root
368463- [AxonOps Documentation](https://docs.axonops.com/)
369464
0 commit comments