|
| 1 | +// Module included in the following assemblies: |
| 2 | +// |
| 3 | +// * hosted_control_planes/hcp-managing.adoc |
| 4 | + |
| 5 | +:_content-type: PROCEDURE |
| 6 | +[id="hosted-control-planes-metrics-sets_{context}"] |
| 7 | += Configuring metrics sets for hosted control planes |
| 8 | + |
| 9 | +Hosted control planes for Red Hat {product-title} creates `ServiceMonitor` resources in each control plane namespace that allow a Prometheus stack to gather metrics from the control planes. The `ServiceMonitor` resources use metrics relabelings to define which metrics are included or excluded from a particular component, such as etcd or the Kubernetes API server. The number of metrics that are produced by control planes directly impacts the resource requirements of the monitoring stack that gathers them. |
| 10 | + |
| 11 | +Instead of producing a fixed number of metrics that apply to all situations, you can configure a metrics set that identifies a set of metrics to produce for each control plane. The following metrics sets are supported: |
| 12 | + |
| 13 | +* `Telemetry`: These metrics are needed for telemetry. This set is the default set and is the smallest set of metrics. |
| 14 | +* `SRE`: This set includes the necessary metrics to produce alerts and allow the troubleshooting of control plane components. |
| 15 | +* `All`: This set includes all of the metrics that are produced by standalone {product-title} control plane components. |
| 16 | +
|
| 17 | +To configure a metrics set, set the `METRICS_SET` environment variable in the HyperShift Operator deployment by entering the following command: |
| 18 | + |
| 19 | +[source,terminal] |
| 20 | +---- |
| 21 | +$ oc set env -n hypershift deployment/operator METRICS_SET=All |
| 22 | +---- |
| 23 | + |
| 24 | +[#hosted-control-planes-sre-metrics-set] |
| 25 | +== Configuring the SRE metrics set |
| 26 | + |
| 27 | +When you specify the `SRE` metrics set, the HyperShift Operator looks for a config map named `sre-metric-set` with a single key: `config`. The value of the `config` key must contain a set of `RelabelConfigs` that are organized by control plane component. |
| 28 | + |
| 29 | +You can specify the following components: |
| 30 | + |
| 31 | +* `etcd` |
| 32 | +* `kubeAPIServer` |
| 33 | +* `kubeControllerManager` |
| 34 | +* `openshiftAPIServer` |
| 35 | +* `openshiftControllerManager` |
| 36 | +* `openshiftRouteControllerManager` |
| 37 | +* `cvo` |
| 38 | +* `olm` |
| 39 | +* `catalogOperator` |
| 40 | +* `registryOperator` |
| 41 | +* `nodeTuningOperator` |
| 42 | +* `controlPlaneOperator` |
| 43 | +* `hostedClusterConfigOperator` |
| 44 | +
|
| 45 | +A configuration of the `SRE` metrics set is illustrated in the following example: |
| 46 | + |
| 47 | +[source,terminal] |
| 48 | +---- |
| 49 | +kubeAPIServer: |
| 50 | + - action: "drop" |
| 51 | + regex: "etcd_(debugging|disk|server).*" |
| 52 | + sourceLabels: ["__name__"] |
| 53 | + - action: "drop" |
| 54 | + regex: "apiserver_admission_controller_admission_latencies_seconds_.*" |
| 55 | + sourceLabels: ["__name__"] |
| 56 | + - action: "drop" |
| 57 | + regex: "apiserver_admission_step_admission_latencies_seconds_.*" |
| 58 | + sourceLabels: ["__name__"] |
| 59 | + - action: "drop" |
| 60 | + regex: "scheduler_(e2e_scheduling_latency_microseconds|scheduling_algorithm_predicate_evaluation|scheduling_algorithm_priority_evaluation|scheduling_algorithm_preemption_evaluation|scheduling_algorithm_latency_microseconds|binding_latency_microseconds|scheduling_latency_seconds)" |
| 61 | + sourceLabels: ["__name__"] |
| 62 | + - action: "drop" |
| 63 | + regex: "apiserver_(request_count|request_latencies|request_latencies_summary|dropped_requests|storage_data_key_generation_latencies_microseconds|storage_transformation_failures_total|storage_transformation_latencies_microseconds|proxy_tunnel_sync_latency_secs)" |
| 64 | + sourceLabels: ["__name__"] |
| 65 | + - action: "drop" |
| 66 | + regex: "docker_(operations|operations_latency_microseconds|operations_errors|operations_timeout)" |
| 67 | + sourceLabels: ["__name__"] |
| 68 | + - action: "drop" |
| 69 | + regex: "reflector_(items_per_list|items_per_watch|list_duration_seconds|lists_total|short_watches_total|watch_duration_seconds|watches_total)" |
| 70 | + sourceLabels: ["__name__"] |
| 71 | + - action: "drop" |
| 72 | + regex: "etcd_(helper_cache_hit_count|helper_cache_miss_count|helper_cache_entry_count|request_cache_get_latencies_summary|request_cache_add_latencies_summary|request_latencies_summary)" |
| 73 | + sourceLabels: ["__name__"] |
| 74 | + - action: "drop" |
| 75 | + regex: "transformation_(transformation_latencies_microseconds|failures_total)" |
| 76 | + sourceLabels: ["__name__"] |
| 77 | + - action: "drop" |
| 78 | + regex: "network_plugin_operations_latency_microseconds|sync_proxy_rules_latency_microseconds|rest_client_request_latency_seconds" |
| 79 | + sourceLabels: ["__name__"] |
| 80 | + - action: "drop" |
| 81 | + regex: "apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)" |
| 82 | + sourceLabels: ["__name__", "le"] |
| 83 | +kubeControllerManager: |
| 84 | + - action: "drop" |
| 85 | + regex: "etcd_(debugging|disk|request|server).*" |
| 86 | + sourceLabels: ["__name__"] |
| 87 | + - action: "drop" |
| 88 | + regex: "rest_client_request_latency_seconds_(bucket|count|sum)" |
| 89 | + sourceLabels: ["__name__"] |
| 90 | + - action: "drop" |
| 91 | + regex: "root_ca_cert_publisher_sync_duration_seconds_(bucket|count|sum)" |
| 92 | + sourceLabels: ["__name__"] |
| 93 | +openshiftAPIServer: |
| 94 | + - action: "drop" |
| 95 | + regex: "etcd_(debugging|disk|server).*" |
| 96 | + sourceLabels: ["__name__"] |
| 97 | + - action: "drop" |
| 98 | + regex: "apiserver_admission_controller_admission_latencies_seconds_.*" |
| 99 | + sourceLabels: ["__name__"] |
| 100 | + - action: "drop" |
| 101 | + regex: "apiserver_admission_step_admission_latencies_seconds_.*" |
| 102 | + sourceLabels: ["__name__"] |
| 103 | + - action: "drop" |
| 104 | + regex: "apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)" |
| 105 | + sourceLabels: ["__name__", "le"] |
| 106 | +openshiftControllerManager: |
| 107 | + - action: "drop" |
| 108 | + regex: "etcd_(debugging|disk|request|server).*" |
| 109 | + sourceLabels: ["__name__"] |
| 110 | +openshiftRouteControllerManager: |
| 111 | + - action: "drop" |
| 112 | + regex: "etcd_(debugging|disk|request|server).*" |
| 113 | + sourceLabels: ["__name__"] |
| 114 | +olm: |
| 115 | + - action: "drop" |
| 116 | + regex: "etcd_(debugging|disk|server).*" |
| 117 | + sourceLabels: ["__name__"] |
| 118 | +catalogOperator: |
| 119 | + - action: "drop" |
| 120 | + regex: "etcd_(debugging|disk|server).*" |
| 121 | + sourceLabels: ["__name__"] |
| 122 | +cvo: |
| 123 | + - action: drop |
| 124 | + regex: "etcd_(debugging|disk|server).*" |
| 125 | + sourceLabels: ["__name__"] |
| 126 | +---- |
0 commit comments