Conversation
|
Review updated until commit 48838a9 Description
|
| Relevant files |
|---|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Potential Device Issue
correct. This fallback behavior should be validated to ensure it handles all multi-device scenarios properly, especially when no communicator is available. |
Test failures
-
(Medium, 27)
NVFuser HostIrJitTest crashes: "Handle not overriden for Allocate" across multiple suitesTest Name A100 GB200 H100 Source HostIrJitTest.AllocationDomainReorder ❌ ❌ ❌ Link HostIrJitTest.BroadcastTest ❌ ❌ ❌ Link HostIrJitTest.Deallocate ❌ ❌ ❌ Link HostIrJitTest.DynamicSizedTensorAllocate ❌ ❌ ❌ Link HostIrJitTest.LaunchKernel ❌ ❌ ❌ Link HostIrJitTest.Linear ❌ ❌ ❌ Link HostIrJitTest.Matmul ❌ ❌ ❌ Link HostIrJitTest.Permute ❌ ❌ ❌ Link HostIrJitTest.Reorder ❌ ❌ ❌ Link -
(Medium, 4)
nvFuser multidevice communication equality mismatches in test_multidevice_lower_communication.cppTest Name A100 (dist.) GB200 (dist.) Source LowerGatherTest.InMesh_1_2_OutMesh_0_2_HostIr ❌ ❌ Link LowerSendRecvTest.InMesh_1_2_OutMesh_0_1_HostIr ❌ ❌ Link
|
!test |
Greptile SummaryThis PR introduces Key changes:
The core node implementation, dispatch wiring, allocate/deallocate pass, and evaluator handler are well-implemented. The main concern is in Confidence Score: 3/5
Last reviewed commit: 48838a9 |
|
!test |
|
Thanks for the early feedback @wujingyue. |
|
!test |
Additional Comments (1)
Previously, only the explicitly For the memory-leak check to remain accurate, it should only flag TVs that went through an Consider adding a specific case for if (auto* alloc = dynamic_cast<hir::Allocate*>(e)) {
allocated.insert(alloc->in());
}and removing the over-broad |
Additional Comments (1)
Both The evaluator still contains
If the pass is never applied to containers that still hold // At the top of insertDeallocations / checkMemoryLeak:
for (auto* expr : hic.topLevelExprs()) {
NVF_ERROR(
!expr->isA<kir::Allocate>(),
"kir::Allocate found in HostIrContainer; use hir::Allocate instead.");
} |
Additional Comments (1)
The refactored code calls Since NVF_ERROR(
!e->inputs().empty() && e->inputs().at(0)->isA<TensorView>(),
"Communication expression must have a TensorView as its first input");
NVF_ERROR(
!e->outputs().empty() && e->outputs().at(0)->isA<TensorView>(),
"Communication expression must have a TensorView as its first output");
TensorView* in = e->input(0)->as<TensorView>();
TensorView* out = e->output(0)->as<TensorView>(); |
|
!test |
Creating a new
hir::Allocatenode that always allocates a new tensor. This is required to create new buffers per stream instead of reusing across streams which will require synchronization.I am not modifying
kir::Allocatehandling. That caused errors with MultiDeviceExecutor tests.