|
| 1 | +# InfluxDB mixin |
| 2 | + |
| 3 | +The InfluxDB mixin is a set of configurable Grafana dashboards and alerts. |
| 4 | + |
| 5 | +The InfluxDB mixin contains the following dashboards: |
| 6 | + |
| 7 | +- InfluxDB cluster overview |
| 8 | +- InfluxDB instance overview |
| 9 | +- InfluxDB logs overview |
| 10 | + |
| 11 | +and the following alerts: |
| 12 | + |
| 13 | +- InfluxDBWarningTaskSchedulerHighFailureRate |
| 14 | +- InfluxDBCriticalTaskSchedulerHighFailureRate |
| 15 | +- InfluxDBHighBusyWorkerPercentage |
| 16 | +- InfluxDBHighHeapMemoryUsage |
| 17 | +- InfluxDBHighAverageAPIRequestLatency |
| 18 | +- InfluxDBSlowAverageIQLExecutionTime |
| 19 | + |
| 20 | +## InfluxDB cluster overview |
| 21 | + |
| 22 | +The InfluxDB cluster overview dashboard provides details on the cluster's performance and highlights top instances. The dashboard covers all available aspects of InfluxDB performance and integration health, including Golang performance, query/request load, and task scheduler activity. |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## InfluxDB instance overview |
| 29 | + |
| 30 | +The InfluxDB instance overview dashboard provides details on one or more instances, including instance configuration stats, Golang performance, query/request load, and task scheduler activity. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +## InfluxDB logs overview |
| 38 | + |
| 39 | +The InfluxDB logs overview dashboard allows users to view incoming InfluxDB logs. The dashboard also allows users to filter logs based on level, service, engine, and custom regex. |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +InfluxDB system logs are enabled by default in the `config.libsonnet` and can be disabled by setting `enableLokiLogs` to `false`. Then run `make` again to regenerate the dashboard: |
| 44 | + |
| 45 | +``` |
| 46 | +{ |
| 47 | + _config+:: { |
| 48 | + enableLokiLogs: false, |
| 49 | + }, |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +For the selectors to properly work for InfluxDB logs ingested into your logs datasource, please also include the matching `instance`, `job`, and `influxdb_cluster` labels in the [scrape_configs](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#scrape_configs) to match the labels for ingested metrics. |
| 54 | + |
| 55 | +```yaml |
| 56 | +scrape_configs: |
| 57 | + - job_name: integrations/influxdb |
| 58 | + static_configs: |
| 59 | + - targets: [localhost] |
| 60 | + labels: |
| 61 | + job: integrations/influxdb |
| 62 | + influxdb_cluster: "<your-cluster-name>" |
| 63 | + instance: "<your-instance-name>" |
| 64 | + __path__: /var/log/influxdb/influxdb.log |
| 65 | + pipeline_stages: |
| 66 | + - multiline: |
| 67 | + firstline: 'ts=\d{4}' |
| 68 | + - regex: |
| 69 | + expression: 'ts=(\S+) lvl=(?P<level>\w+) msg=.* log_id=(\S+) (service=(?P<service>\S+) ){0,1}(engine=(?P<engine>\S*) ){0,1}.*$' |
| 70 | + - labels: |
| 71 | + level: |
| 72 | + service: |
| 73 | + engine: |
| 74 | +``` |
| 75 | +
|
| 76 | +## Alerts overview |
| 77 | +
|
| 78 | +- InfluxDBWarningTaskSchedulerHighFailureRate: Automated data processing tasks are failing at a high rate. |
| 79 | +- InfluxDBCriticalTaskSchedulerHighFailureRate: Automated data processing tasks are failing at a critical rate. |
| 80 | +- InfluxDBHighBusyWorkerPercentage: There is a high percentage of busy workers. |
| 81 | +- InfluxDBHighHeapMemoryUsage: There is a high amount of heap memory being used. |
| 82 | +- InfluxDBHighAverageAPIRequestLatency: Average API request latency is too high. High latency will negatively affect system performance, degrading data availability and precision. |
| 83 | +- InfluxDBSlowAverageIQLExecutionTime: InfluxQL execution times are too slow. Slow query execution times will negatively affect system performance, degrading data availability and precision. |
| 84 | +
|
| 85 | +Default thresholds can be configured in `config.libsonnet`. |
| 86 | + |
| 87 | +```js |
| 88 | +{ |
| 89 | + _config+:: { |
| 90 | + alertsWarningTaskSchedulerHighFailureRate: 25, // % |
| 91 | + alertsCriticalTaskSchedulerHighFailureRate: 50, // % |
| 92 | + alertsWarningHighBusyWorkerPercentage: 80, // % |
| 93 | + alertsWarningHighHeapMemoryUsage: 80, // % |
| 94 | + alertsWarningHighAverageAPIRequestLatency: 0.1, // count |
| 95 | + alertsWarningSlowAverageIQLExecutionTime: 0.1, // count |
| 96 | + }, |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +## Install tools |
| 101 | + |
| 102 | +```bash |
| 103 | +go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest |
| 104 | +go install github.com/monitoring-mixins/mixtool/cmd/mixtool@latest |
| 105 | +``` |
| 106 | + |
| 107 | +For linting and formatting, you would also need `jsonnetfmt` installed. If you |
| 108 | +have a working Go development environment, it's easiest to run the following: |
| 109 | + |
| 110 | +```bash |
| 111 | +go install github.com/google/go-jsonnet/cmd/jsonnetfmt@latest |
| 112 | +``` |
| 113 | + |
| 114 | +The files in `dashboards_out` need to be imported |
| 115 | +into your Grafana server. The exact details will depend on your environment. |
| 116 | + |
| 117 | +`prometheus_alerts.yaml` needs to be imported into Prometheus. |
| 118 | + |
| 119 | +## Generate dashboards and alerts |
| 120 | + |
| 121 | +Edit `config.libsonnet` if required and then build JSON dashboard files for Grafana: |
| 122 | + |
| 123 | +```bash |
| 124 | +make |
| 125 | +``` |
| 126 | + |
| 127 | +For more advanced uses of mixins, see |
| 128 | +https://github.com/monitoring-mixins/docs. |
0 commit comments