Skip to content

Commit 20e42e0

Browse files
authored
Merge pull request #47415 from tmalove/etcd-3716-hardware-reco
[OSDOCS-3716]: Add etcd hardware recommendations
2 parents 822c7bf + 2c0238a commit 20e42e0

File tree

2 files changed

+18
-0
lines changed

2 files changed

+18
-0
lines changed

modules/recommended-etcd-practices.adoc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,20 @@ For more information about defragmenting etcd, see the "Defragmenting etcd data"
1818

1919
Because etcd writes data to disk and persists proposals on disk, its performance depends on disk performance. Slow disks and disk activity from other processes can cause long fsync latencies. Those latencies can cause etcd to miss heartbeats, not commit new proposals to the disk on time, and ultimately experience request timeouts and temporary leader loss. Run etcd on machines that are backed by SSD or NVMe disks with low latency and high throughput. Consider single-level cell (SLC) solid-state drives (SSDs), which provide 1 bit per memory cell, are durable and reliable, and are ideal for write-intensive workloads.
2020

21+
The following hard disk features provide optimal etcd performance:
22+
23+
* Low latency to support fast read operation.
24+
* High-bandwidth writes for faster compactions and defragmentation.
25+
* High-bandwidth reads for faster recovery from failures.
26+
* Solid state drives as a minimum selection, however NVMe drives are preferred.
27+
* Server-grade hardware from various manufacturers for increased reliability.
28+
* RAID 0 technology for increased performance.
29+
* Dedicated etcd drives. Do not place log files or other heavy workloads on etcd drives.
30+
31+
Avoid NAS or SAN setups, and spinning drives. Always benchmark using utilities such as `fio`. Continuously monitor the cluster performance as it increases.
32+
33+
IMPORTANT: Avoid using the Network File System (NFS) protocol.
34+
2135
Some key metrics to monitor on a deployed {product-title} cluster are p99 of etcd disk write ahead log duration and the number of etcd leader changes. Use Prometheus to track these metrics.
2236

2337
* The `etcd_disk_wal_fsync_duration_seconds_bucket` metric reports the etcd disk fsync duration.

scalability_and_performance/recommended-host-practices.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ include::modules/increasing-aws-flavor-size.adoc[leveloffset=+2]
2929
3030
include::modules/recommended-etcd-practices.adoc[leveloffset=+1]
3131

32+
[role="_additional-resources"]
33+
.Additional resources
34+
* link:https://access.redhat.com/solutions/4885641[How to use `fio` to check etcd disk performance in {product-title}]
35+
3236
include::modules/etcd-defrag.adoc[leveloffset=+1]
3337

3438
include::modules/infrastructure-components.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)