Skip to content

Commit 4c758d6

Browse files
xeniaperazvan
andauthored
chore: add test and docs for metrics (#764)
* chore: add test and docs for metrics * Update tests/templates/kuttl/smoke/check-metrics.py Co-authored-by: Razvan-Daniel Mihai <[email protected]> * adjust smoke_aws tests * fix metrics test and move to commons folder --------- Co-authored-by: Razvan-Daniel Mihai <[email protected]>
1 parent 54f80b5 commit 4c758d6

File tree

7 files changed

+125
-0
lines changed

7 files changed

+125
-0
lines changed

docs/modules/trino/pages/usage-guide/monitoring.adoc

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,38 @@
33

44
The managed Trino instances are automatically configured to export Prometheus metrics.
55
See xref:operators:monitoring.adoc[] for more details.
6+
7+
== Metrics
8+
9+
Trino automatically exposes built-in Prometheus metrics on coordinators and workers. The metrics are available on the `http` (`8080/metrics`) or
10+
`https` (`8443/metrics`) port, depending on the TLS settings.
11+
12+
The following `ServiceMonitor` example, demonstrates how the metrics could be scraped using the https://prometheus-operator.dev/[Prometheus Operator].
13+
14+
[source,yaml]
15+
----
16+
apiVersion: monitoring.coreos.com/v1
17+
kind: ServiceMonitor
18+
metadata:
19+
name: scrape-label
20+
spec:
21+
endpoints:
22+
- port: https # or http
23+
scheme: https # or http
24+
path: /metrics
25+
basicAuth: # <1>
26+
username:
27+
name: trino-user-secret
28+
key: username
29+
password:
30+
name: trino-user-secret
31+
key: password
32+
jobLabel: app.kubernetes.io/instance
33+
namespaceSelector:
34+
any: true
35+
selector:
36+
matchLabels:
37+
prometheus.io/scrape: "true"
38+
----
39+
40+
<1> Add user information if Trino is configuration to use authentication
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
#!/usr/bin/env python3
2+
import argparse
3+
import requests
4+
import time
5+
6+
7+
def print_request_error_and_sleep(message, err, retry_count):
8+
print("[" + str(retry_count) + "] " + message, err)
9+
time.sleep(5)
10+
11+
12+
def try_get(url):
13+
retries = 3
14+
for i in range(retries):
15+
try:
16+
if "coordinator" in url:
17+
r = requests.get(
18+
url,
19+
timeout=5,
20+
headers={"x-trino-user": "admin"},
21+
auth=("admin", "admin"),
22+
verify=False,
23+
)
24+
else:
25+
r = requests.get(
26+
url, timeout=5, headers={"x-trino-user": "admin"}, verify=False
27+
)
28+
r.raise_for_status()
29+
return r
30+
except requests.exceptions.HTTPError as errh:
31+
print_request_error_and_sleep("Http Error: ", errh, i)
32+
except requests.exceptions.ConnectionError as errc:
33+
print_request_error_and_sleep("Error Connecting: ", errc, i)
34+
except requests.exceptions.Timeout as errt:
35+
print_request_error_and_sleep("Timeout Error: ", errt, i)
36+
except requests.exceptions.RequestException as err:
37+
print_request_error_and_sleep("Error: ", err, i)
38+
39+
exit(-1)
40+
41+
42+
def check_monitoring(hosts):
43+
for host in hosts:
44+
# test for the jmx exporter metrics
45+
url = "http://" + host + ":8081/metrics"
46+
response = try_get(url)
47+
48+
if not response.ok:
49+
print("Error for [" + url + "]: could not access monitoring")
50+
exit(-1)
51+
52+
# test for the native metrics
53+
url = "https://" + host + ":8443/metrics"
54+
response = try_get(url)
55+
56+
if response.ok:
57+
# arbitrary metric was chosen to test if metrics are present in the response
58+
if "io_airlift_node_name_NodeInfo_StartTime" in response.text:
59+
continue
60+
else:
61+
print("Error for [" + url + "]: missing metrics")
62+
exit(-1)
63+
else:
64+
print("Error for [" + url + "]: could not access monitoring")
65+
exit(-1)
66+
67+
68+
if __name__ == "__main__":
69+
all_args = argparse.ArgumentParser(description="Test Trino metrics.")
70+
all_args.add_argument(
71+
"-n", "--namespace", help="The namespace to run in", required=True
72+
)
73+
args = vars(all_args.parse_args())
74+
namespace = args["namespace"]
75+
76+
host_coordinator_0 = f"trino-coordinator-default-0.trino-coordinator-default.{namespace}.svc.cluster.local"
77+
host_worker_0 = (
78+
f"trino-worker-default-0.trino-worker-default.{namespace}.svc.cluster.local"
79+
)
80+
81+
hosts = [host_coordinator_0, host_worker_0]
82+
83+
check_monitoring(hosts)
84+
85+
print("Test check-metrics.py succeeded!")

tests/templates/kuttl/smoke/21-assert.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ commands:
66
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 1
77
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
88
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
9+
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE

tests/templates/kuttl/smoke/21-copy-scripts.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ commands:
55
- script: kubectl cp -n $NAMESPACE ./check-active-workers.py trino-test-helper-0:/tmp || true
66
- script: kubectl cp -n $NAMESPACE ./check-opa.py trino-test-helper-0:/tmp || true
77
- script: kubectl cp -n $NAMESPACE ./check-s3.py trino-test-helper-0:/tmp || true
8+
- script: kubectl cp -n $NAMESPACE ../../../../templates/kuttl/commons/check-metrics.py trino-test-helper-0:/tmp || true

tests/templates/kuttl/smoke/31-assert.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ commands:
66
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 2
77
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
88
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
9+
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE

tests/templates/kuttl/smoke_aws/21-assert.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ commands:
66
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-active-workers.py -u admin -p admin -n $NAMESPACE -w 1
77
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-opa.py -n $NAMESPACE
88
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-s3.py -n $NAMESPACE
9+
- script: kubectl exec -n $NAMESPACE trino-test-helper-0 -- python /tmp/check-metrics.py -n $NAMESPACE

tests/templates/kuttl/smoke_aws/21-copy-scripts.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ commands:
55
- script: kubectl cp -n $NAMESPACE ./check-active-workers.py trino-test-helper-0:/tmp || true
66
- script: kubectl cp -n $NAMESPACE ./check-opa.py trino-test-helper-0:/tmp || true
77
- script: kubectl cp -n $NAMESPACE ./check-s3.py trino-test-helper-0:/tmp || true
8+
- script: kubectl cp -n $NAMESPACE ../../../../templates/kuttl/commons/check-metrics.py trino-test-helper-0:/tmp || true

0 commit comments

Comments
 (0)