-
Notifications
You must be signed in to change notification settings - Fork 78
add 2d inputs and copy transpose to transpose benchmark #5915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review updated until commit af22dc7 Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Parameter Integration
is_copy_transpose and rank parameters are properly integrated into the test framework, but verify that all test combinations (2D/3D × copy/view transpose) are correctly generated and executed without parameter conflicts. |
Greptile OverviewGreptile SummaryExtended transpose benchmark to test both copy and view transpose operations, and added 2D input coverage alongside existing 3D inputs. Key Changes:
Impact:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Test as Test Function
participant Gen as _generate_transpose_params
participant Fusion as transpose_fusion
participant Eager as transpose_fwd_fn
participant Bench as run_benchmark
Test->>Gen: Request test parameters
Gen->>Gen: Generate params for dims=2,3
Gen->>Gen: For each size, axes, dims
Gen-->>Test: Return (size, axes, dims) tuples
Test->>Test: Create input tensors (input1, input2)
Test->>Test: Compute permute_axes from axes
Test->>Fusion: Define fusion with rank, is_copy_transpose
Fusion->>Fusion: Define tensors with dynamic rank
Fusion->>Fusion: Add + Permute + ReLU ops
alt is_copy_transpose
Fusion->>Fusion: Apply segment_set(T9) → T10
Fusion->>Fusion: add_output(T10)
else view_transpose
Fusion->>Fusion: add_output(T9)
end
opt Validation enabled
Test->>Eager: Execute eager function
Eager->>Eager: Add + Transpose + ReLU
alt is_copy_transpose
Eager->>Eager: Apply .contiguous()
end
Eager-->>Test: Return eager_output
Test->>Fusion: Validate against eager
end
opt Benchmarking enabled
Test->>Bench: Run nvFuser benchmark
Bench->>Fusion: Execute fusion
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
naoyam
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
!test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
| @pytest.mark.parametrize("size,axes,dims", _generate_transpose_params()) | ||
| @pytest.mark.parametrize("dtype", FLOAT_DTYPES) | ||
| @pytest.mark.parametrize("axes", [(0, 1), (0, 2), (1, 2)]) | ||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to benchmark view transpose? Should we remove it instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know, it's not an expensive benchmark, so I just leave it as is in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Please work with @xwang233 for dashboard integration.
|
|
||
| @pytest.mark.parametrize("executor", DEFAULT_EXECUTORS) | ||
| @pytest.mark.parametrize("size", generate_input_sizes(dims=3)) | ||
| @pytest.mark.parametrize("size,axes,dims", _generate_transpose_params()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I used 3D inputs to match C++ benchmark. If 2D inputs are sufficient for benchmarking, we should remove the 3D benchmarking. This should also simplify the dashboard for this benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should keep 3D for different axes
Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
benchmarks/python/test_transpose.py
Outdated
| # add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler. | ||
| #we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break these long comments into multiple lines for better readability
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Priya2698
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. If you find that the view transpose variant is not meaningful anymore, please remove it in a future follow-up.
|
!build |
It provides apple-to-apple compare and ensure nvFuser is smart enough to detect and use view transpose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Two extensions to transpose benchmark in
benchmarks/python/test_transpose.py(1) Adds coverage for copy vs. view transpose
Previously, we only exercised view transpose, which returns a non-contiguous tensor and the pointwise scheduler is used. As a result, the transpose scheduler was never actually used.
This PR adds
.contiguous()to enforce a contiguous output layout, which triggers a copy-based transpose.For manually defined fusions, a
segment_setwas added to the fusion to avoid the pre-segmentation pass (AllocationDomainPass) changing transpose output layout to ensure the copy transpose path is taken.For view transpose, the output has a allocation domain of
(iS11{i0}, iS10{i1})which is same as inputFinal fusion is:
For copy transpose, the output is T6, it has
a transposed allocation domain : (iS12{i1}, iS13{i0}):Final fusion is:
(2) Generalizes fusion input ranks to 2D
Previously, fusion inputs were limited to 3D shapes, with roughly 100 test cases per data type. This PR expands coverage to include 2D input shapes as well.