Skip to content

UCT/IB/RC: add adaptive TX CQ moderation#11305

Open
ndg8743 wants to merge 1 commit intoopenucx:masterfrom
ndg8743:fix/adaptive-tx-cq-moderation
Open

UCT/IB/RC: add adaptive TX CQ moderation#11305
ndg8743 wants to merge 1 commit intoopenucx:masterfrom
ndg8743:fix/adaptive-tx-cq-moderation

Conversation

@ndg8743
Copy link
Copy Markdown

@ndg8743 ndg8743 commented Mar 30, 2026

Summary

  • When many RC endpoints each send fewer messages than tx_moderation, no signaled sends are generated and TX CQ credits are never reclaimed, exhausting the TX buffer pool.
  • Fix uct_rc_iface_tx_moderation() to also check iface-level cq_available: when CQ credits drop to the tx_moderation threshold, force the next send on any endpoint to be signaled.

Closes #1307

@ndg8743 ndg8743 force-pushed the fix/adaptive-tx-cq-moderation branch from 6ee59d0 to 4052e51 Compare March 31, 2026 02:15
@ndg8743
Copy link
Copy Markdown
Author

ndg8743 commented Apr 2, 2026

CI failure on ASAN new worker 2:

  • Test: dc_mlx5/test_uct_perf.envelope/1 — hung during put latency measurement on dc_mlx5 transport, hit 15-minute connection timeout watchdog, aborted. Performance envelope tests are known to be sensitive to ASAN overhead and CI node load.

Retriggering CI.

@ndg8743 ndg8743 force-pushed the fix/adaptive-tx-cq-moderation branch from 0a8c53b to 4052e51 Compare April 3, 2026 04:44
When many RC endpoints each send fewer messages than the
tx_moderation threshold, no signaled sends are generated and TX CQ
credits are never reclaimed, eventually exhausting the TX buffer pool.

Fix by also checking iface-level cq_available in the moderation
decision. When CQ credits drop to the tx_moderation threshold, force
the next send on any endpoint to be signaled, ensuring timely
completion processing regardless of per-endpoint send counts.

Closes openucx#1307
@ndg8743 ndg8743 force-pushed the fix/adaptive-tx-cq-moderation branch from 4052e51 to e1e4199 Compare April 4, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UCT/IB/RC: develop adaptive TX CQ moderation

1 participant