[Feature] Registering cuMem address without NVLS or MNNVL support

Currently, registering local cuMem-mapped addresses in RegisterMemory requires NVLS/MNNVL support ([the code here](https://github.com/microsoft/mscclpp/blob/main/src/gpu_utils.cc#L316) explicitly requires this). Given that real-world PyTorch and Megatron workloads use the cuMem driver API to manage tensors, this causes problems when integrating MSCCL++ into PyTorch and Megatron on machines without NVLS/MNNVL support. AFAIK, cuMem should be usable without such specialized hardware. Is such coupling of the cuMem buffer registration and multicast support an intended feature or an ad-hoc constraint that will be released in a future release?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Registering cuMem address without NVLS or MNNVL support #714

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Registering cuMem address without NVLS or MNNVL support #714

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions