-
Notifications
You must be signed in to change notification settings - Fork 106
Description
Dear LINBIT Support / DRBD Team,
I am working with DRBD9 in a three-node Debian cluster, and I am experiencing unexpectedly low resynchronization throughput. Although the system uses dedicated 10 Gbps links and local SSD storage, the observed resync performance remains well below expectations, and neither CPU nor disks appear to be saturated.
Environment details:
-
Cluster size: 3 nodes
-
OS: Debian, kernel
6.1.0-33-amd64(Debian 6.1.133-1, April 2025 build) -
DRBD versions:
DRBDADM_VERSION = 9.30.0DRBD_KERNEL_VERSION = 9.2.13(0x09020d)DRBDADM_API_VERSION = 2- Build tag:
GIT-hash: 36ea199f38b543b2da92219109c2832e122e5bf9 (2025-01-23)
-
Network: Dedicated 10 Gbps interconnects; NICs report full capability and appear to support RDMA
-
Storage: Local SSDs without hardware RAID (fio results available on request)
I would greatly appreciate your guidance on how to tune DRBD9 for this type of setup. Specifically, I am interested in practical recommendations for parameters in the net { ... } and disk { ... } sections that significantly affect resync performance, such as buffer sizes, epoch limits, and resync-rate controls.
I am also seeking advice on the choice of transport. Under what conditions would RDMA provide meaningful throughput or latency benefits compared to TCP for DRBD9, and what kernel modules, user-space libraries (such as rdma-core), and configuration checks are required to confirm RDMA is operating correctly? If RDMA is not recommended in this environment, I would be very grateful for detailed TCP tuning guidance to achieve near line-rate resync performance on 10 Gbps links.
To assist in diagnosing the issue, I can provide:
- Sanitized
/etc/drbd.d/<resource>.resfiles ethtool -Sandib_devinfooutputsiperf3and fio benchmark resultsjournalctl -u drbdanddmesglogs collected during resync
Thank you very much for your time and support. I look forward to your recommendations.
Sincerely,
Fateme Bakhoda