Merge pull request #46630 from lahinson/etcd-ssd-recs-3717

kalexand-rh · web-flow · commit c0a2c1e76c84 · 2022-06-27T08:22:18.000-04:00
OSDOCS-3717 update etcd recommendations for scalability and performance
diff --git a/modules/recommended-etcd-practices.adoc b/modules/recommended-etcd-practices.adoc
@@ -6,59 +6,52 @@
 [id="recommended-etcd-practices_{context}"]
 = Recommended etcd practices
 
-For large and dense clusters, etcd can suffer from poor performance
-if the keyspace grows excessively large and exceeds the space quota.
-Periodic maintenance of etcd, including defragmentation, must be performed
-to free up space in the data store. It is highly recommended that you monitor
-Prometheus for etcd metrics and defragment it when required before etcd raises
-a cluster-wide alarm that puts the cluster into a maintenance mode, which
-only accepts key reads and deletes. Some of the key metrics to monitor are
-`etcd_server_quota_backend_bytes` which is the current quota limit,
-`etcd_mvcc_db_total_size_in_use_in_bytes` which indicates the actual
-database usage after a history compaction, and
-`etcd_debugging_mvcc_db_total_size_in_bytes` which shows the database size
-including free space waiting for defragmentation. Instructions on defragging
-etcd can be found in the `Defragmenting etcd data` section.
-
-Etcd writes data to disk, so its performance strongly depends on disk performance. Etcd
-persists proposals on disk. Slow disks and disk activity from other processes might cause long
-fsync latencies, causing etcd to miss heartbeats, inability to commit new proposals to the disk
-on time, which can cause request timeouts and temporary leader loss. It is highly recommended to
-run etcd on machines backed by SSD/NVMe disks with low latency and high throughput.
-
-Some of the key metrics to monitor on a deployed {product-title} cluster
-are p99 of etcd disk write ahead log duration and the number of etcd leader changes.
-Use Prometheus to track these metrics. `etcd_disk_wal_fsync_duration_seconds_bucket`
-reports the etcd disk fsync duration, `etcd_server_leader_changes_seen_total` reports
-the leader changes. To rule out a slow disk and confirm that the disk is reasonably fast,
-99th percentile of the `etcd_disk_wal_fsync_duration_seconds_bucket` should be less than 10ms.
-
-Fio, a I/O benchmarking tool can be used to validate the hardware for etcd before or after
-creating the {product-title} cluster. Run fio and analyze the results:
-
-Assuming container runtimes like podman or docker are installed on the machine under test and
-the path etcd writes the data exists - /var/lib/etcd, run:
+For large and dense clusters, etcd can suffer from poor performance if the keyspace grows too large and exceeds the space quota. Periodically maintain and defragment etcd to free up space in the data store. Monitor Prometheus for etcd metrics and defragment it when required; otherwise, etcd can raise a cluster-wide alarm that puts the cluster into a maintenance mode that accepts only key reads and deletes.
+
+.Monitor these key metrics:
+
+* `etcd_server_quota_backend_bytes`, which is the current quota limit
+* `etcd_mvcc_db_total_size_in_use_in_bytes`, which indicates the actual database usage after a history compaction
+* `etcd_debugging_mvcc_db_total_size_in_bytes`, which shows the database size, including free space waiting for defragmentation
+
+For more information about defragmenting etcd, see the "Defragmenting etcd data" section.
+
+Because etcd writes data to disk and persists proposals on disk, its performance depends on disk performance. Slow disks and disk activity from other processes can cause long fsync latencies. Those latencies can cause etcd to miss heartbeats, not commit new proposals to the disk on time, and ultimately experience request timeouts and temporary leader loss. Run etcd on machines that are backed by SSD or NVMe disks with low latency and high throughput. Consider single-level cell (SLC) solid-state drives (SSDs), which provide 1 bit per memory cell, are durable and reliable, and are ideal for write-intensive workloads.
+
+Some key metrics to monitor on a deployed {product-title} cluster are p99 of etcd disk write ahead log duration and the number of etcd leader changes. Use Prometheus to track these metrics.
+
+* The `etcd_disk_wal_fsync_duration_seconds_bucket` metric reports the etcd disk fsync duration.
+* The `etcd_server_leader_changes_seen_total` metric reports the leader changes.
+* To rule out a slow disk and confirm that the disk is reasonably fast, verify that the 99th percentile of the `etcd_disk_wal_fsync_duration_seconds_bucket` is less than 10 ms.
+
+To validate the hardware for etcd before or after you create the {product-title} cluster, you can use an I/O benchmarking tool called fio.
+
+.Prerequisites
+
+* Container runtimes such as Podman or Docker are installed on the machine that you're testing.
+* Data is written to the `/var/lib/etcd` path.
 
 .Procedure
-Run the following if using podman:
+* Run fio and analyze the results:
++
+--
+** If you use Podman, run this command:
 [source,terminal]
++
 ----
 $ sudo podman run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
 ----
 
-Alternatively, run the following if using docker:
+** If you use Docker, run this command:
 [source,terminal]
++
 ----
 $ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf
 ----
+--
+
+The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms.
 
-The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile
-of the fsync metric captured from the run to see if it is less than 10ms.
+Because etcd replicates the requests among all the members, its performance strongly depends on network input/output (I/O) latency. High network latencies result in etcd heartbeats taking longer than the election timeout, which results in leader elections that are disruptive to the cluster. A key metric to monitor on a deployed {product-title} cluster is the 99th percentile of etcd network peer latency on each etcd cluster member. Use Prometheus to track the metric.
 
-Etcd replicates the requests among all the members, so its performance strongly depends on network
-input/output (IO) latency. High network latencies result in etcd heartbeats taking longer than the
-election timeout, which leads to leader elections that are disruptive to the cluster. A key metric
-to monitor on a deployed {product-title} cluster is the 99th percentile of etcd network peer latency
-on each etcd cluster member. Use Prometheus to track the metric. `histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))`
-reports the round trip time for etcd to finish replicating the client requests between the members;
-it should be less than 50 ms.
+The `histogram_quantile(0.99, rate(etcd_network_peer_round_trip_time_seconds_bucket[2m]))` metric reports the round trip time for etcd to finish replicating the client requests between the members. Ensure that it is less than 50 ms.