|
| 1 | +--- |
| 2 | +title: Metrics |
| 3 | +--- |
| 4 | + |
| 5 | +import Label from '@site/src/components/Label'; |
| 6 | + |
| 7 | +K3s provides metrics for monitoring the health and performance of the cluster. |
| 8 | + |
| 9 | +Most metrics are provided by individual components. See the following component-specific documentation for more information: |
| 10 | +* [coredns metrics](https://coredns.io/plugins/metrics/) |
| 11 | +* [etcd metrics](https://etcd.io/docs/v3.5/metrics/) |
| 12 | + |
| 13 | +Additional metrics may be provided by other components. Consult the upstream project documentation for any components not listed above. |
| 14 | + |
| 15 | +## Supervisor Metrics |
| 16 | + |
| 17 | +When K3s is started with `supervisor-metrics: true`, metrics are exposed by the K3s process and can be accessed via the `/metrics` endpoint on each node at port `6443`: |
| 18 | + |
| 19 | +```sh |
| 20 | +kubectl get --server https://NODENAME:6443 --raw /metrics |
| 21 | +``` |
| 22 | + |
| 23 | +Metrics exposed by the K3s supervisor process include: |
| 24 | +* K3s Cluster Management Metrics |
| 25 | +* [Lasso controller metrics](https://github.com/rancher/lasso/blob/main/README.md#lasso-controller) |
| 26 | +* [Kubernetes client and workqueue metrics](https://github.com/kubernetes/client-go/blob/master/README.md) |
| 27 | +* [Kubernetes Node Metrics](https://kubernetes.io/docs/reference/instrumentation/node-metrics/) |
| 28 | +* [Kubernetes Component Metrics](https://kubernetes.io/docs/reference/instrumentation/metrics/) |
| 29 | +* [Go runtime metrics](https://pkg.go.dev/runtime/metrics#hdr-Supported_metrics) |
| 30 | +* If the K3s embedded registry is enabled, [Spegel metrics](https://spegel.dev/docs/metrics/) and [libp2p metrics](https://github.com/libp2p/go-libp2p/blob/master/README.md) |
| 31 | + |
| 32 | +K3s runs all Kubernetes components in the main K3s process. |
| 33 | +Since Kubernetes uses a single Prometheus metric registry per process, metrics for all components are available via all exposed metrics endpoints. |
| 34 | +If you scrape all the individual metrics endpoints, you may find that you are collecting duplicate metrics. |
| 35 | +It is only necessary to scrape a single K3s metric endpoint in order to get metrics for all embedded Kubernetes components. |
| 36 | + |
| 37 | +## K3s Cluster Management Metrics |
| 38 | + |
| 39 | +### k3s_certificate_expiration_seconds |
| 40 | + |
| 41 | +Remaining lifetime in seconds of the certificate, labeled by certificate subject and usages. |
| 42 | +- Type: Gauge |
| 43 | +- Labels: <Label>subject</Label> <Label>usage</Label> |
| 44 | + |
| 45 | +### k3s_loadbalancer_server_connections |
| 46 | + |
| 47 | +Count of current connections to loadbalancer server, labeled by loadbalancer name and server address. |
| 48 | +- Type: Gauge |
| 49 | +- Labels: <Label>name</Label> <Label>server</Label> |
| 50 | + |
| 51 | +### k3s_loadbalancer_server_health |
| 52 | + |
| 53 | +Current health state of loadbalancer backend servers, labeled by loadbalancer name and server address. |
| 54 | +State is enum of 0=INVALID, 1=FAILED, 2=STANDBY, 3=UNCHECKED, 4=RECOVERING, 5=HEALTHY, 6=PREFERRED, 7=ACTIVE. |
| 55 | +- Type: Gauge |
| 56 | +- Labels: <Label>name</Label> <Label>server</Label> |
| 57 | + |
| 58 | +### k3s_loadbalancer_dial_duration_seconds |
| 59 | + |
| 60 | +Time in seconds taken to dial a connection to a backend server, labeled by loadbalancer name and success/failure status. |
| 61 | +- Type: Histogram |
| 62 | +- Labels: <Label>name</Label> <Label>status</Label> |
| 63 | + |
| 64 | +### k3s_etcd_snapshot_save_duration_seconds |
| 65 | + |
| 66 | +Total time in seconds taken to complete the etcd snapshot process, labeled by success/failure status. |
| 67 | +- Type: Histrogram |
| 68 | +- Labels: <Label>status</Label> |
| 69 | + |
| 70 | +### k3s_etcd_snapshot_save_local_duration_seconds |
| 71 | + |
| 72 | +Total time in seconds taken to save a local snapshot file, labeled by success/failure status. |
| 73 | +- Type: Histrogram |
| 74 | +- Labels: <Label>status</Label> |
| 75 | + |
| 76 | +### k3s_etcd_snapshot_save_s3_duration_seconds |
| 77 | + |
| 78 | +Total time in seconds taken to upload a snapshot file to S3, labeled by success/failure status. |
| 79 | +- Type: Histrogram |
| 80 | +- Labels: <Label>status</Label> |
| 81 | + |
| 82 | +### k3s_etcd_snapshot_reconcile_duration_seconds |
| 83 | + |
| 84 | +Total time in seconds taken to sync the list of etcd snapshots, labeled by success/failure status. |
| 85 | +- Type: Histrogram |
| 86 | +- Labels: <Label>status</Label> |
| 87 | + |
| 88 | +### k3s_etcd_snapshot_reconcile_local_duration_seconds |
| 89 | + |
| 90 | +Total time in seconds taken to list local snapshot files, labeled by success/failure status. |
| 91 | +- Type: Histrogram |
| 92 | +- Labels: <Label>status</Label> |
| 93 | + |
| 94 | +### k3s_etcd_snapshot_reconcile_s3_duration_seconds |
| 95 | + |
| 96 | +Total time in seconds taken to list S3 snapshot files, labeled by success/failure status. |
| 97 | +- Type: Histrogram |
| 98 | +- Labels: <Label>status</Label> |
0 commit comments