Skip to content

[Performance]: Current default configuration limits Mooncake Transfer Engine's performance over bonded network interfaces #1668

@usernamehaha2022

Description

@usernamehaha2022

Describe your performance question

Hello Team,
When transmitting over RDMA, Mooncake's default configuration is MC_SLICE_SIZE=65536, MC_MAX_WR=256, and NUM_QP_PER_EP=2. With larger KV cache blocks, these settings lead to suboptimal performance. For requests averaging several megabytes, using such a small SLICE_SIZE introduces significant chunking overhead on the CPU.

In our tests, despite the bonded NIC having a peak bandwidth of 50 GB/s, Mooncake only achieved 37 GB/s. Consequently, we attempted to increase MC_SLICE_SIZE to 1MB. However, in a bonded network interface environment, although setting NUM_QP_PER_EP distributes the QPs evenly across the two sub-devices of the bonded port, a single submitTransferTask fails to trigger its internal pipeline. Instead, it merely performs a single submitPostSend using the first context (associated with the bonded NIC on the same PCIe bridge). As a result, the submitPostSend operation relies on only one QP for transmission, leaving the bandwidth of the second sub-device entirely unutilized, which result in 23GB/s bandwidth.

Expected behavior

When an endpoint has multiple QPs, each submitPostSend call should use all QPs for that batch’s slices, so that:

  • Even when a batch is sent in one call (large MC_SLICE_SIZE + default MC_MAX_WR), load is spread across QPs.
  • No single QP is overloaded; aggregate capacity (sum of per-QP max_wr_depth_) is used.

Suggested fix

  • In one submitPostSend call, distribute slices across QPs in round-robin. We have implemented a prototype based on this approach. By utilizing this round-robin distribution with MC_SLICE_SIZE=1048576 and MC_MAX_WR=256, our transmission performance significantly improved, reaching 45 GB/s.

Before submitting a new issue...

  • Make sure you already searched for relevant issues and read the documentation

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions