You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/monitoring-determining-why-prometheus-is-consuming-disk-space.adoc
+34-17Lines changed: 34 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,9 @@ Every assigned key-value pair has a unique time series. The use of many unbound
13
13
14
14
You can use the following measures when Prometheus consumes a lot of disk:
15
15
16
-
* *Check the number of scrape samples* that are being collected.
16
+
* *Check the time series database (TSDB) status using the Prometheus HTTP API* for more information about which labels are creating the most time series data. Doing so requires cluster administrator privileges.
17
17
18
-
* *Check the time series database (TSDB) status using the Prometheus HTTP API* for more information about which labels are creating the most time series. Doing so requires cluster administrator privileges.
18
+
* *Check the number of scrape samples* that are being collected.
19
19
20
20
* *Reduce the number of unique time series that are created* by reducing the number of unbound attributes that are assigned to user-defined metrics.
. In the *Administrator* perspective, navigate to *Observe*->*Metrics*.
42
42
43
-
. Run the following Prometheus Query Language (PromQL) query in the *Expression* field. This returns the ten metrics that have the highest number of scrape samples:
43
+
. Enter a Prometheus Query Language (PromQL) query in the *Expression* field.
44
+
The following example queries help to identify high cardinality metrics that might result in high disk space consumption:
45
+
46
+
* By running the following query, you can identify the ten jobs that have the highest number of scrape samples:
44
47
+
45
-
[source,terminal]
48
+
[source,text]
49
+
----
50
+
topk(10, max by(namespace, job) (topk by(namespace, job) (1, scrape_samples_post_metric_relabeling)))
51
+
----
52
+
+
53
+
* By running the following query, you can pinpoint time series churn by identifying the ten jobs that have created the most time series data in the last hour:
54
+
+
55
+
[source,text]
46
56
----
47
-
topk(10,count by (job)({__name__=~".+"}))
57
+
topk(10, sum by(namespace, job) (sum_over_time(scrape_series_added[1h])))
48
58
----
49
59
50
-
. Investigate the number of unbound label values assigned to metrics with higher than expected scrape sample counts.
51
-
***If the metrics relate to a user-defined project*, review the metrics key-value pairs assigned to your workload. These are implemented through Prometheus client libraries at the application level. Try to limit the number of unbound attributes referenced in your labels.
60
+
. Investigate the number of unbound label values assigned to metrics with higher than expected scrape sample counts:
61
+
62
+
* *If the metrics relate to a user-defined project*, review the metrics key-value pairs assigned to your workload. These are implemented through Prometheus client libraries at the application level. Try to limit the number of unbound attributes referenced in your labels.
52
63
53
-
***If the metrics relate to a core {product-title} project*, create a Red Hat support case on the link:https://access.redhat.com/[Red Hat Customer Portal].
64
+
* *If the metrics relate to a core {product-title} project*, create a Red Hat support case on the link:https://access.redhat.com/[Red Hat Customer Portal].
54
65
55
-
. Review the TSDB status using the Prometheus HTTP API by running the following commands as a
66
+
. Review the TSDB status using the Prometheus HTTP API by following these steps when logged in as a
56
67
ifndef::openshift-dedicated,openshift-rosa[]
57
68
cluster administrator:
58
69
endif::openshift-dedicated,openshift-rosa[]
59
70
ifdef::openshift-dedicated,openshift-rosa[]
60
71
`dedicated-admin`:
61
72
endif::openshift-dedicated,openshift-rosa[]
62
73
+
63
-
[source,terminal]
64
-
----
65
-
$ oc login -u <username> -p <password>
66
-
----
74
+
.. Get the Prometheus API route URL by running the following command:
67
75
+
68
76
[source,terminal]
69
77
----
70
-
$ host=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath={.spec.host})
78
+
$ HOST=$(oc -n openshift-monitoring get route prometheus-k8s -ojsonpath={.spec.host})
71
79
----
72
80
+
81
+
.. Extract an authentication token by running the following command:
82
+
+
73
83
[source,terminal]
74
84
----
75
-
$ token=$(oc whoami -t)
85
+
$ TOKEN=$(oc whoami -t)
76
86
----
77
87
+
88
+
.. Query the TSDB status for Prometheus by running the following command:
* See xref:../monitoring/configuring-the-monitoring-stack.adoc#setting-scrape-sample-and-label-limits-for-user-defined-projects_configuring-the-monitoring-stack[Setting a scrape sample limit for user-defined projects] for details on how to set a scrape sample limit and create related alerting rules
39
+
* xref:../monitoring/accessing-third-party-monitoring-apis.adoc#about-accessing-monitoring-web-service-apis_accessing-monitoring-apis-by-using-the-cli[Accessing monitoring APIs by using the CLI]
40
+
* xref:../monitoring/configuring-the-monitoring-stack.adoc#setting-scrape-sample-and-label-limits-for-user-defined-projects_configuring-the-monitoring-stack[Setting a scrape sample limit for user-defined projects]
40
41
* xref:../support/getting-support.adoc#support-submitting-a-case_getting-support[Submitting a support case]
0 commit comments