Skip to content

Commit c0a2c1e

Browse files
authored
Merge pull request #46630 from lahinson/etcd-ssd-recs-3717
OSDOCS-3717 update etcd recommendations for scalability and performance
2 parents c5eb78d + 25224af commit c0a2c1e

File tree

1 file changed

+36
-43
lines changed

1 file changed

+36
-43
lines changed

modules/recommended-etcd-practices.adoc

Lines changed: 36 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -6,59 +6,52 @@
66
[id="recommended-etcd-practices_{context}"]
77
= Recommended etcd practices
88

9-
For large and dense clusters, etcd can suffer from poor performance
10-
if the keyspace grows excessively large and exceeds the space quota.
11-
Periodic maintenance of etcd, including defragmentation, must be performed
12-
to free up space in the data store. It is highly recommended that you monitor
13-
Prometheus for etcd metrics and defragment it when required before etcd raises
14-
a cluster-wide alarm that puts the cluster into a maintenance mode, which
15-
only accepts key reads and deletes. Some of the key metrics to monitor are
16-
`etcd_server_quota_backend_bytes` which is the current quota limit,
17-
`etcd_mvcc_db_total_size_in_use_in_bytes` which indicates the actual
18-
database usage after a history compaction, and
19-
`etcd_debugging_mvcc_db_total_size_in_bytes` which shows the database size
20-
including free space waiting for defragmentation. Instructions on defragging
21-
etcd can be found in the `Defragmenting etcd data` section.
22-
23-
Etcd writes data to disk, so its performance strongly depends on disk performance. Etcd
24-
persists proposals on disk. Slow disks and disk activity from other processes might cause long
25-
fsync latencies, causing etcd to miss heartbeats, inability to commit new proposals to the disk
26-
on time, which can cause request timeouts and temporary leader loss. It is highly recommended to
27-
run etcd on machines backed by SSD/NVMe disks with low latency and high throughput.
28-
29-
Some of the key metrics to monitor on a deployed {product-title} cluster
30-
are p99 of etcd disk write ahead log duration and the number of etcd leader changes.
31-
Use Prometheus to track these metrics. `etcd_disk_wal_fsync_duration_seconds_bucket`
32-
reports the etcd disk fsync duration, `etcd_server_leader_changes_seen_total` reports
33-
the leader changes. To rule out a slow disk and confirm that the disk is reasonably fast,
34-
99th percentile of the `etcd_disk_wal_fsync_duration_seconds_bucket` should be less than 10ms.
35-
36-
Fio, a I/O benchmarking tool can be used to validate the hardware for etcd before or after
37-
creating the {product-title} cluster. Run fio and analyze the results:
38-
39-
Assuming container runtimes like podman or docker are installed on the machine under test and
40-
the path etcd writes the data exists - /var/lib/etcd, run:
9+
For large and dense clusters, etcd can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment etcd to free up space in the data store. Monitor Prometheus for etcd metrics and defragment it when required; otherwise, etcd can raise a cluster-wide alarm that puts the cluster into a maintenance mode that accepts only key reads and deletes.
10+
11+
.Monitor these key metrics:
12+
13+
* `etcd_server_quota_backend_bytes`, which is the current quota limit
14+
* `etcd_mvcc_db_total_size_in_use_in_bytes`, which indicates the actual database usage after a history compaction
15+
* `etcd_debugging_mvcc_db_total_size_in_bytes`, which shows the database size, including free space waiting for defragmentation
16+
17+
For more information about defragmenting etcd, see the "Defragmenting etcd data" section.
18+
19+
Because etcd writes data to disk and persists proposals on disk, its performance depends on disk performance. Slow disks and disk activity from other processes can cause long fsync latencies. Those latencies can cause etcd to miss heartbeats, not commit new proposals to the disk on time, and ultimately experience request timeouts and temporary leader loss. Run etcd on machines that are backed by SSD or NVMe disks with low latency and high throughput. Consider single-level cell (SLC) solid-state drives (SSDs), which provide 1 bit per memory cell, are durable and reliable, and are ideal for write-intensive workloads.
20+
21+
Some key metrics to monitor on a deployed {product-title} cluster are p99 of etcd disk write ahead log duration and the number of etcd leader changes. Use Prometheus to track these metrics.
22+
23+
* The `etcd_disk_wal_fsync_duration_seconds_bucket` metric reports the etcd disk fsync duration.
24+
* The `etcd_server_leader_changes_seen_total` metric reports the leader changes.
25+
* To rule out a slow disk and confirm that the disk is reasonably fast, verify that the 99th percentile of the `etcd_disk_wal_fsync_duration_seconds_bucket` is less than 10 ms.
26+
27+
To validate the hardware for etcd before or after you create the {product-title} cluster, you can use an I/O benchmarking tool called fio.
28+
29+
.Prerequisites
30+
31+
* Container runtimes such as Podman or Docker are installed on the machine that you're testing.
32+
* Data is written to the `/var/lib/etcd` path.
4133
4234
.Procedure
43-
Run the following if using podman:
35+
* Run fio and analyze the results:
36+
+
37+
--
38+
** If you use Podman, run this command:
4439
[source,terminal]
40+
+
4541
----
4642
$ sudo podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
4743
----
4844
49-
Alternatively, run the following if using docker:
45+
** If you use Docker, run this command:
5046
[source,terminal]
47+
+
5148
----
5249
$ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
5350
----
51+
--
52+
53+
The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms.
5454

55-
The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile
56-
of the fsync metric captured from the run to see if it is less than 10ms.
55+
Because etcd replicates the requests among all the members, its performance strongly depends on network input/output (I/O) latency. High network latencies result in etcd heartbeats taking longer than the election timeout, which results in leader elections that are disruptive to the cluster. A key metric to monitor on a deployed {product-title} cluster is the 99th percentile of etcd network peer latency on each etcd cluster member. Use Prometheus to track the metric.
5756

58-
Etcd replicates the requests among all the members, so its performance strongly depends on network
59-
input/output (IO) latency. High network latencies result in etcd heartbeats taking longer than the
60-
election timeout, which leads to leader elections that are disruptive to the cluster. A key metric
61-
to monitor on a deployed {product-title} cluster is the 99th percentile of etcd network peer latency
62-
on each etcd cluster member. Use Prometheus to track the metric. `histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))`
63-
reports the round trip time for etcd to finish replicating the client requests between the members;
64-
it should be less than 50 ms.
57+
The `histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))` metric reports the round trip time for etcd to finish replicating the client requests between the members. Ensure that it is less than 50 ms.

0 commit comments

Comments
 (0)