|
| 1 | +# Presto mixin |
| 2 | + |
| 3 | +The Presto mixin is a set of configurable Grafana dashboards and alerts. |
| 4 | + |
| 5 | +The Presto mixin contains the following dashboards: |
| 6 | + |
| 7 | +- Presto overview |
| 8 | +- Presto coordinator |
| 9 | +- Presto worker |
| 10 | +- Presto logs |
| 11 | + |
| 12 | +and the following alerts: |
| 13 | + |
| 14 | +- PrestoHighInsufficientResources |
| 15 | +- PrestoHighTaskFailuresWarning |
| 16 | +- PrestoHighTaskFailuresCritical |
| 17 | +- PrestoHighQueuedTaskCount |
| 18 | +- PrestoHighBlockedNodes |
| 19 | +- PrestoHighFailedQueriesWarning |
| 20 | +- PrestoHighFailedQueriesCritical |
| 21 | + |
| 22 | +## Presto overview |
| 23 | + |
| 24 | +The Presto overview dashboard provides details on integration status/alerts, workers/coordinators, error failures, data throughput, blocked nodes, and distributed bytes. |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +## Presto coordinator overview |
| 29 | + |
| 30 | +The Presto coordinator overview dashboard provides details on various query counts and rates, query execution time, CPU time consumed, CPU input throughput, error failures, JVM metrics, and memory pool information. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +## Presto worker overview |
| 35 | + |
| 36 | +The Presto worker overview dashboard provides details on various task rates, pool sizes, output positions, data throughput, JVM metrics, and memory pool information. |
| 37 | + |
| 38 | + |
| 39 | + |
| 40 | +## Presto logs |
| 41 | + |
| 42 | +The Presto logs dashboard provides details on incoming system logs. |
| 43 | + |
| 44 | + |
| 45 | +Presto system logs are enabled by default in the `config.libsonnet` and can be removed by setting `enableLokiLogs` to `false`. Then run `make` again to regenerate the dashboard: |
| 46 | + |
| 47 | +``` |
| 48 | +{ |
| 49 | + _config+:: { |
| 50 | + enableLokiLogs: false, |
| 51 | + }, |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +In order for the selectors to properly work for system logs ingested into your logs datasource, please also include the matching `instance`, `job`, and `presto_cluster` labels onto the [scrape configs](https://grafana.com/docs/loki/latest/clients/promtail/configuration/#scrape_configs) as to match the labels for ingested metrics. |
| 56 | + |
| 57 | +```yaml |
| 58 | +scrape_configs: |
| 59 | + - job_name: integrations/presto |
| 60 | + static_configs: |
| 61 | + - targets: [localhost] |
| 62 | + labels: |
| 63 | + job: integrations/presto |
| 64 | + instance: "<your-instance-name>" |
| 65 | + presto_cluster: "<your-cluster-name>" |
| 66 | + __path__: /var/presto/logs/*.log |
| 67 | + pipeline_stages: |
| 68 | + - multiline: |
| 69 | + firstline: '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}' |
| 70 | + - regex: |
| 71 | + expression: '\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z\s+(?P<level>\w+)(?P<message>.+)' |
| 72 | + - labels: |
| 73 | + level: |
| 74 | +``` |
| 75 | +
|
| 76 | +## Alerts overview |
| 77 | +
|
| 78 | +- PrestoHighInsufficientFailures: The amount of failures that are occurring due to insufficient resources are scaling, causing saturation in the system. |
| 79 | +- PrestoHighTaskFailuresWarning: The amount of tasks that are failing is increasing, this might affect query processing and could result in incomplete or incorrect results. |
| 80 | +- PrestoHighTaskFailuresCritical: The amount of tasks that are failing has reached a critical level. This might affect query processing and could result in incomplete or incorrect results. |
| 81 | +- PrestoHighQueuedTaskCount: The amount of tasks that are being put in queue is increasing. A high number of queued tasks can lead to increased query latencies and degraded system performance. |
| 82 | +- PrestoHighBlockedNodes: The amount of nodes that are blocked due to memory restrictions is increasing. Blocked nodes can cause performance degradation and resource starvation. |
| 83 | +- PrestoHighFailedQueriesWarning: The amount of queries failing is increasing. Failed queries can prevent users from accessing data, disrupt analytics processes, and might indicate underlying issues with the system or data. |
| 84 | +- PrestoHighFailedQueriesCritical: The amount of queries failing has increased to critical levels. Failed queries can prevent users from accessing data, disrupt analytics processes, and might indicate underlying issues with the system or data. |
| 85 | +
|
| 86 | +Default thresholds can be configured in `config.libsonnet`. |
| 87 | + |
| 88 | +```js |
| 89 | +{ |
| 90 | + _configs+:: { |
| 91 | +
|
| 92 | + // alerts thresholds |
| 93 | + alertsHighInsufficientResourceErrors: 0, // count |
| 94 | + alertsHighTaskFailuresWarning: 0, // count |
| 95 | + alertsHighTaskFailuresCritical: 30, // percent |
| 96 | + alertsHighQueuedTaskCount: 5, // count |
| 97 | + alertsHighBlockedNodesCount: 0, // count |
| 98 | + alertsHighFailedQueryCountWarning: 0, // count |
| 99 | + alertsHighFailedQueryCountCritical: 30, // percent |
| 100 | + } |
| 101 | +} |
| 102 | +``` |
| 103 | + |
| 104 | +## Install tools |
| 105 | + |
| 106 | +```bash |
| 107 | +go install github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb@latest |
| 108 | +go install github.com/monitoring-mixins/mixtool/cmd/mixtool@latest |
| 109 | +``` |
| 110 | + |
| 111 | +For linting and formatting, you would also need `jsonnetfmt` installed. If you |
| 112 | +have a working Go development environment, it's easiest to run the following: |
| 113 | + |
| 114 | +```bash |
| 115 | +go install github.com/google/go-jsonnet/cmd/jsonnetfmt@latest |
| 116 | +``` |
| 117 | + |
| 118 | +The files in `dashboards_out` need to be imported |
| 119 | +into your Grafana server. The exact details will be depending on your environment. |
| 120 | + |
| 121 | +`prometheus_alerts.yaml` needs to be imported into Prometheus. |
| 122 | + |
| 123 | +## Generate dashboards and alerts |
| 124 | + |
| 125 | +Edit `config.libsonnet` if required and then build JSON dashboard files for Grafana: |
| 126 | + |
| 127 | +```bash |
| 128 | +make |
| 129 | +``` |
| 130 | + |
| 131 | +For more advanced uses of mixins, see |
| 132 | +https://github.com/monitoring-mixins/docs. |
0 commit comments