Add kCuda backend support for MM+RS stream lowering #5761

nsarka · 2026-01-05T18:28:22Z

No description provided.

greptile-apps · 2026-01-05T18:31:37Z

Greptile Summary

This PR extends the stream parallel type lowering pass to support the kCuda backend for matrix multiplication with reduce-scatter (MM+RS) operations. Previously, only the kNccl backend was supported for this lowering path.

Key Changes

Added kCuda backend handling in two critical lowering paths (MM+RS and AG+MM algorithms) in stream_parallel_type.cpp
- Implemented ShareMemHandles for memory handle sharing between P2P operations
- Added protocol-specific ordering (P2pProtocol::Get vs P2pProtocol::Put) for send/recv operations
- Implemented deferred wait semantics: wait_recv executes immediately while wait_send is deferred to loop epilogue
- Both code paths now have consistent structure and error handling
Refactored protocol selection from if-else chains to switch statements for better code clarity and exhaustive enum handling
Enhanced test coverage by converting ReduceScatterP2p test to a parameterized test that validates both kCuda and kNccl backends

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation follows established patterns from existing AG+MM path, maintains consistent structure across both lowering paths, properly handles all enum cases with exhaustive switch statements, includes comprehensive test coverage for both backends, and adds appropriate error handling for unsupported backends
No files require special attention

Important Files Changed

Filename	Overview
csrc/host_ir/pass/stream_parallel_type.cpp	Added kCuda backend support for MM+RS stream lowering with proper P2P communication handling, deferred wait semantics, and refactored protocol selection to use switch statements
tests/cpp/test_multidevice_stream_parallel_type.cpp	Converted ReduceScatterP2p test to parameterized test that validates both kCuda and kNccl backends

Sequence Diagram

sequenceDiagram
    participant Lowering as Stream Parallel Lowering
    participant Backend as Communicator Backend
    participant P2P as P2P Communication
    participant Wait as Wait Handler
    
    Lowering->>Backend: Check communicator_backend
    
    alt kNccl Backend
        Backend->>P2P: StartCoalescing
        P2P->>P2P: Create RECV
        P2P->>P2P: Create SEND
        P2P->>P2P: EndCoalescing
        P2P->>Wait: Wait(end_coalescing)
    else kCuda Backend
        Backend->>P2P: ShareMemHandles(recv, send)
        alt P2pProtocol::Get
            P2P->>P2P: SEND first
            P2P->>P2P: RECV second
        else P2pProtocol::Put
            P2P->>P2P: RECV first
            P2P->>P2P: SEND second
        end
        P2P->>Wait: Wait(recv) - immediate
        P2P->>Wait: Wait(send) - deferred to epilogue
    else Unsupported
        Backend->>Lowering: NVF_THROW
    end

greptile-apps · 2026-01-05T18:31:38Z

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

_{This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".}

tests/cpp/test_multidevice_stream_parallel_type.cpp

csrc/host_ir/pass/stream_parallel_type.cpp

nsarka · 2026-01-05T19:05:11Z

Thanks for the suggestions @wujingyue

csrc/host_ir/pass/stream_parallel_type.cpp

wujingyue · 2026-01-05T19:06:42Z

csrc/host_ir/pass/stream_parallel_type.cpp

+              if_sending_to_self->elseBody().pushBack(send);
+              break;
+            }
+            default:


tests/cpp/test_multidevice_stream_parallel_type.cpp

nsarka · 2026-01-05T20:05:00Z

!test

nsarka requested review from samnordmann and wujingyue January 5, 2026 18:28

nsarka self-assigned this Jan 5, 2026

nsarka force-pushed the nsarka/rs-gemm-cuda branch from a250a75 to d5a36ee Compare January 5, 2026 18:28

wujingyue approved these changes Jan 5, 2026

View reviewed changes

tests/cpp/test_multidevice_stream_parallel_type.cpp Outdated Show resolved Hide resolved

wujingyue reviewed Jan 5, 2026

View reviewed changes

csrc/host_ir/pass/stream_parallel_type.cpp Outdated Show resolved Hide resolved

nsarka force-pushed the nsarka/rs-gemm-cuda branch from 487bc2b to 162c0c7 Compare January 5, 2026 18:50

nsarka requested a review from wujingyue January 5, 2026 19:05

wujingyue approved these changes Jan 5, 2026

View reviewed changes

nsarka added 4 commits January 5, 2026 15:04

Add cuda backend to mm+rs stream lowering

f36d639

Linter

bae57ef

Update

1ab5b38

Updates

49d62fd

nsarka force-pushed the nsarka/rs-gemm-cuda branch from 557cbc4 to 49d62fd Compare January 5, 2026 20:04

nsarka requested a review from wujingyue January 5, 2026 20:04

wujingyue approved these changes Jan 5, 2026

View reviewed changes

nsarka merged commit 1bbd375 into NVIDIA:main Jan 5, 2026
54 of 56 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add kCuda backend support for MM+RS stream lowering #5761

Add kCuda backend support for MM+RS stream lowering #5761

Uh oh!

nsarka commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 5, 2026

Uh oh!

Uh oh!

Uh oh!

nsarka commented Jan 5, 2026

Uh oh!

Uh oh!

wujingyue Jan 5, 2026

Uh oh!

Uh oh!

nsarka commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add kCuda backend support for MM+RS stream lowering #5761

Add kCuda backend support for MM+RS stream lowering #5761

Uh oh!

Conversation

nsarka commented Jan 5, 2026

Uh oh!

greptile-apps bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Key Changes

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot commented Jan 5, 2026

Greptile's behavior is changing!

Uh oh!

Uh oh!

Uh oh!

nsarka commented Jan 5, 2026

Uh oh!

Uh oh!

wujingyue Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nsarka commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Jan 5, 2026 •

edited

Loading