-
Notifications
You must be signed in to change notification settings - Fork 75
rename to test_multidevice_tutorial #5756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Greptile SummaryRenames the test binary from Key Changes:
Critical Issue: Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Test Runner
participant SimplePipelining as SimplePipelining Test
participant HostIrKernelPipelining as HostIrKernelPipelining Test
participant Macro as SKIP_IF_NOT_ENOUGH_DEVICES
participant Executor as MultiDeviceExecutor
Note over Test: Single GPU Environment
Test->>SimplePipelining: Run test
SimplePipelining->>Macro: Check device count against fusion requirements
alt Has 2+ GPUs
Macro->>SimplePipelining: Continue
SimplePipelining->>Executor: Create MultiDeviceExecutor
Executor->>SimplePipelining: Execute test
else Only 1 GPU
Macro->>SimplePipelining: GTEST_SKIP
SimplePipelining->>Test: Test skipped
end
Test->>HostIrKernelPipelining: Run test
Note over HostIrKernelPipelining: DISABLED_ prefix prevents execution
HostIrKernelPipelining->>Test: Test disabled (not run)
Note right of HostIrKernelPipelining: Underlying segfault bug<br/>remains unaddressed
|
|
!test |
|
Review updated until commit 9386247 Auto-merge Status❌ Internal CI is finished (nvfuser-ci status not found) Description
|
| Relevant files | |||||
|---|---|---|---|---|---|
| Enhancement |
| ||||
| Configuration changes |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Test naming consistency
|
|
For binary test, we only run those binaries that start with |
| // To do so, we will be using new Host IRs: Stream (a Val), SetStream, ForLoop. | ||
| TEST_F(MultiDeviceTutorial, HostIrKernekPipelining) { | ||
| if (communicator_->size() < 2) { | ||
| GTEST_SKIP() << "Need at least 2 devices to run this test"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you instead call
Line 61 in 7c38ba3
| const auto num_devices = communicator_->size(); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved check to right before runWithInput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My message must have landed poorly -- I meant the SKIP_IF_NOT_ENOUGH_DEVICES macro:
Line 59 in 7c38ba3
| #define SKIP_IF_NOT_ENOUGH_DEVICES(fusion) \ |
| // Stages. | ||
| TEST_F(MultiDeviceTutorial, SimplePipelining) { | ||
| if (communicator_->size() < 2) { | ||
| GTEST_SKIP() << "Need at least 2 devices to run this test"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't move to right before runWithInput. In this case HostIrEvaluator --> HostIrEvaluator::validate() --> NVF_CHECK_LE(requested_n_gpus, communicator_->size()); where requested_n_gpus = 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then move it before HostIrEvaluator is constructed.
The reason I'm proposing this change is to avoid coupling. If/when the input fusion is changed to require 4 GPUs, SKIP_IF_NOT_ENOUGH_DEVICES will work just fine without any extra change.
I don't think CI runs multidevice_tutorial at all. I think jit.xml should be changed to run all test_multidevice_* |
good point. I renamed to |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
| if (communicator_->size() < 2) { | ||
| GTEST_SKIP() << "Need at least 2 devices to run this test"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: GPU count check is too late - placed after HostIrEvaluator construction on line 1123-1126. When hic is moved into the evaluator, it's consumed. If the test then skips, the HostIrContainer is already destroyed. Move this check before line 1123.
| if (communicator_->size() < 2) { | |
| GTEST_SKIP() << "Need at least 2 devices to run this test"; | |
| } | |
| if (communicator_->size() < 2) { | |
| GTEST_SKIP() << "Need at least 2 devices to run this test"; | |
| } | |
| auto outputs = hie.runWithInput({{tv0, aten_tv0}, {tv2, aten_tv2}}); |
|
!test |
|
@wujingyue CI script is updated, |
|
!test |
No problem -- disable that test and I'll take a look. |
wujingyue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
undo LGTM
| */ | ||
| // To do so, we will be using new Host IRs: Stream (a Val), SetStream, ForLoop. | ||
| TEST_F(MultiDeviceTutorial, HostIrKernekPipelining) { | ||
| GTEST_SKIP() << "Caught signal 11 (Segmentation fault: address not mapped to " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use GTEST_SKIP to disable. Use DISABLE_ so people can at least https://google.github.io/googletest/advanced.html#temporarily-enabling-disabled-tests. In contrast, there's no way to "unskip" a GTEST_SKIP.
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 1 comment
Two minor change:
tutorial_multidevicetotest_multidevice_tutorialAfter revision: (in a node with 1 gpu)
Original errs: