Skip to content

UCP/PROTO: Add option to enforce zcopy protocols for early error detection#11280

Open
tvegas1 wants to merge 5 commits intoopenucx:masterfrom
tvegas1:enforce_rma_zcopy
Open

UCP/PROTO: Add option to enforce zcopy protocols for early error detection#11280
tvegas1 wants to merge 5 commits intoopenucx:masterfrom
tvegas1:enforce_rma_zcopy

Conversation

@tvegas1
Copy link
Copy Markdown
Contributor

@tvegas1 tvegas1 commented Mar 20, 2026

What?

Add UCX_TLS_RMA to list transports to enforce zcopy protocols when found.

Why?

In one-sided operations use-case, users need to understand quickly if system is not operating at full capacity.

How?

For a given proto key, we make sure that at least the protocol of the last size range is a zcopy/offload get or put, where applicable.

  • UCX_TLS_RMA=cuda_ipc (similar for rocm_ipc):
    • intra-node zcopy for cuda memory, if cuda_ipc is found on system, inter-node mandated with mnnvl.
  • UCX_TLS_RMA=rc_mlx5,dc_mlx5:
    • intra/inter-node zcopy for host and cuda/rocm memory if rc_mlx5/dc_mlx5 exist are found on system, error if they miss any supported gpu memory.
  • UCX_TLS_RMA=rc_mlx5,cuda_ipc (similar for rocm_ipc):
    • intra/inter-node zcopy for host and cuda/rocm memory, even without mnnvl.

@tvegas1 tvegas1 added the WIP-DNM Work in progress / Do not review label Mar 20, 2026
@tvegas1 tvegas1 changed the title UCP/PROTO: Add RMA ZCOPY flag UCP/PROTO: Add option to enforce zcopy protocols for early error detection Mar 20, 2026
@guy-ealey-morag
Copy link
Copy Markdown
Contributor

Remember to update the copyright years of all changed files to 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP-DNM Work in progress / Do not review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants