Merge pull request #55760 from tmalove/etcd-ocpbugs-7283-tlove

jab-rh · web-flow · commit 580644ed06b8 · 2023-02-14T14:48:58.000-05:00
[OCPBUGS-7283]: Revert etcd disk latency to 10ms
diff --git a/modules/recommended-etcd-practices.adoc b/modules/recommended-etcd-practices.adoc
@@ -11,7 +11,7 @@ Although etcd is not particularly I/O intensive, it requires a low latency block
 
 Those latencies can cause etcd to miss heartbeats, not commit new proposals to the disk on time, and ultimately experience request timeouts and temporary leader loss. High write latencies also lead to an OpenShift API slowness, which affects cluster performance. Because of these reasons, avoid colocating other workloads on the control-plane nodes.
 
-In terms of latency, run etcd on top of a block device that can write at least 50 IOPS of 8000 bytes long sequentially. That is, with a latency of 20ms, keep in mind that uses fdatasync to synchronize each write in the WAL. For heavy loaded clusters, sequential 500 IOPS of 8000 bytes (2 ms) are recommended. To measure those numbers, you can use a benchmarking tool, such as fio.
+In terms of latency, run etcd on top of a block device that can write at least 50 IOPS of 8000 bytes long sequentially. That is, with a latency of 10ms, keep in mind that uses fdatasync to synchronize each write in the WAL. For heavy loaded clusters, sequential 500 IOPS of 8000 bytes (2 ms) are recommended. To measure those numbers, you can use a benchmarking tool, such as fio.
 
 To achieve such performance, run etcd on machines that are backed by SSD or NVMe disks with low latency and high throughput. Consider single-level cell (SLC) solid-state drives (SSDs), which provide 1 bit per memory cell, are durable and reliable, and are ideal for write-intensive workloads.
 
@@ -65,7 +65,7 @@ $ sudo docker run --volume /var/lib/etcd:/var/lib/etcd:Z quay.io/openshift-scale
 ----
 --
 
-The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 20 ms. A few of the most important etcd metrics that might affected by I/O performance are as follow:
+The output reports whether the disk is fast enough to host etcd by comparing the 99th percentile of the fsync metric captured from the run to see if it is less than 10 ms. A few of the most important etcd metrics that might affected by I/O performance are as follow:
 
 * `etcd_disk_wal_fsync_duration_seconds_bucket` metric reports the etcd's WAL fsync duration
 * `etcd_disk_backend_commit_duration_seconds_bucket`  metric reports the etcd backend commit latency duration