add 2d inputs and copy transpose to transpose benchmark #5915

liqiangxl · 2026-02-03T15:09:22Z

Two extensions to transpose benchmark in benchmarks/python/test_transpose.py

(1) Adds coverage for copy vs. view transpose
Previously, we only exercised view transpose, which returns a non-contiguous tensor and the pointwise scheduler is used. As a result, the transpose scheduler was never actually used.
This PR adds .contiguous() to enforce a contiguous output layout, which triggers a copy-based transpose.
For manually defined fusions, a segment_set was added to the fusion to avoid the pre-segmentation pass (AllocationDomainPass) changing transpose output layout to ensure the copy transpose path is taken.

For view transpose, the output has a allocation domain of (iS11{i0}, iS10{i1}) which is same as input

T5_g_float[iS10{i1}, iS11{i0}]
 logical domain : (iS10{i1}, iS11{i0})
 allocation domain : (iS11{i0}, iS10{i1})
 contiguity: t t
 loop domain : (iS10{i1}, iS11{i0})

Final fusion is:

Segmented_Fusion{ 
groups: 
  pointwise{0, 1, 2, 3}
edges: 

group details:
g{(pointwise)
group id: 0
inputs:
  T0_g_float[iS0{i0}, iS1{i1}] float
  T1_g_float[iS12{i0}, iS13{i1}] float
outputs:
  T5_g_float[iS10{i1}, iS11{i0}] float


T2_l_float[iS4{i0}, iS5{i1}]
   = T0_g_float[iS0{i0}, iS1{i1}]
   + T1_g_float[iS12{i0}, iS13{i1}];
(0)
T3_l_float[iS7{i1}, iS6{i0}]
   = Set.Permute( T2_l_float[iS4{i0}, iS5{i1}], cache_op=Streaming )
(1)
T4_l_bool[iS8{i1}, iS9{i0}]
   = T3_l_float[iS7{i1}, iS6{i0}]
   > double(0);
(2)
T5_g_float[iS10{i1}, iS11{i0}]
   = where(T4_l_bool[iS8{i1}, iS9{i0}]
  , T3_l_float[iS7{i1}, iS6{i0}]
  , double(0));
(3)
}

} //Segmented_Fusion

For copy transpose, the output is T6, it has a transposed allocation domain : (iS12{i1}, iS13{i0}):

T6_g_float[iS12{i1}, iS13{i0}]
   = SegmenterSet( T5_l_float[iS10{i1}, iS11{i0}] )

T5_l_float[iS10{i1}, iS11{i0}]
 logical domain : (iS10{i1}, iS11{i0})
 contiguity: t t
 loop domain : (iS10{i1}, iS11{i0})
T6_g_float[iS12{i1}, iS13{i0}]
 logical domain : (iS12{i1}, iS13{i0})
 allocation domain : (iS12{i1}, iS13{i0})
 contiguity: t t
 loop domain : (iS12{i1}, iS13{i0})

Final fusion is:

Segmented_Fusion{ 
groups: 
  transpose{0, 1, 2, 3, 4}
edges: 

group details:
g{(transpose)
group id: 0
inputs:
  T0_g_float[iS0{i0}, iS1{i1}] float
  T1_g_float[iS14{i0}, iS15{i1}] float
outputs:
  T6_g_float[iS12{i1}, iS13{i0}] float


T2_l_float[iS4{i0}, iS5{i1}]
   = T0_g_float[iS0{i0}, iS1{i1}]
   + T1_g_float[iS14{i0}, iS15{i1}];
(0)
T3_g_float[iS7{i1}, iS6{i0}]
   = Set.Permute( T2_l_float[iS4{i0}, iS5{i1}], cache_op=Streaming )
(1)
T4_g_bool[iS8{i1}, iS9{i0}]
   = T3_g_float[iS7{i1}, iS6{i0}]
   > double(0);
(2)
T5_g_float[iS10{i1}, iS11{i0}]
   = where(T4_g_bool[iS8{i1}, iS9{i0}]
  , T3_g_float[iS7{i1}, iS6{i0}]
  , double(0));
(3)
T6_g_float[iS12{i1}, iS13{i0}]
   = SegmenterSet( T5_g_float[iS10{i1}, iS11{i0}] )
(4)
}

} //Segmented_Fusion

(2) Generalizes fusion input ranks to 2D
Previously, fusion inputs were limited to 3D shapes, with roughly 100 test cases per data type. This PR expands coverage to include 2D input shapes as well.

github-actions · 2026-02-03T15:11:49Z

Review updated until commit af22dc7

Description

Add support for 2D and 3D input tensors in transpose benchmark
Implement copy vs view transpose testing with contiguous() calls
Add segment_set to fusion definition for copy transpose path
Update test parametrization to cover both transpose modes

Changes walkthrough

Relevant files

Enhancement

test_transpose.py `Extend transpose benchmark with 2D inputs and copy transpose` benchmarks/python/test_transpose.py Modified transpose_fusion function to support dynamic rank and copy transpose mode Added segment_set logic to enforce copy transpose path when needed Updated transpose_fwd_fn to handle contiguous() calls for copy transpose Added _generate_transpose_params for 2D/3D parameter generation Enhanced test parametrization with is_copy_transpose parameter Updated both nvFuser and baseline benchmarks to test both modes	+65/-22

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Parameter Integration The new `is_copy_transpose` and `rank` parameters are properly integrated into the test framework, but verify that all test combinations (2D/3D × copy/view transpose) are correctly generated and executed without parameter conflicts. @pytest.mark.parametrize("size,axes,dims", _generate_transpose_params()) @pytest.mark.parametrize("dtype", FLOAT_DTYPES) @pytest.mark.parametrize( "is_copy_transpose", [True, False], ids=["copy_transpose", "view_transpose"], ) Segment Set Logic The segment_set operation is added only for copy transpose to enforce the transpose scheduler. Ensure this logic correctly distinguishes between scenarios where transpose scheduler vs pointwise scheduler should be used, and that the segment_set doesn't interfere with other operations. if is_copy_transpose: T10 = fd.ops.segment_set(T9) fd.add_output(T10) else: fd.add_output(T9)

greptile-apps · 2026-02-03T15:29:17Z

Greptile Overview

Greptile Summary

Extended transpose benchmark to test both copy and view transpose operations, and added 2D input coverage alongside existing 3D inputs.

Key Changes:

Added is_copy_transpose parameter to toggle between copy (contiguous) and view (non-contiguous) transpose
For copy transpose: added segment_set operation in fusion definition to prevent presegmentation passes from optimizing to view, and .contiguous() call in eager function to materialize copy
Generalized input tensor rank from hardcoded 3D to dynamic 2D/3D using rank parameter
Introduced _generate_transpose_params() helper to generate test combinations of (size, axes, dims)
2D inputs test only (0,1) axes while 3D inputs test (0,1), (0,2), (1,2) axes

Impact:

Significantly expands test coverage (roughly doubles test cases with copy/view split, plus adds 2D coverage)
Ensures transpose scheduler is properly exercised (copy transpose) vs. pointwise scheduler (view transpose)
Better aligns benchmark with real-world usage patterns

Confidence Score: 4/5

Safe to merge with minor review comments addressed
Code is well-structured with clear separation between copy and view transpose paths. The implementation correctly handles dynamic rank tensors and test parameter generation. Previous review comments about typos and formatting have been noted. No critical logical errors or security issues found.
No files require special attention beyond addressing formatting feedback from previous threads

Important Files Changed

Filename	Overview
benchmarks/python/test_transpose.py	Extended transpose benchmark to cover 2D inputs and copy vs. view transpose, adding `segment_set` to enforce copy transpose path and `.contiguous()` for materialization

Sequence Diagram

sequenceDiagram
    participant Test as Test Function
    participant Gen as _generate_transpose_params
    participant Fusion as transpose_fusion
    participant Eager as transpose_fwd_fn
    participant Bench as run_benchmark

    Test->>Gen: Request test parameters
    Gen->>Gen: Generate params for dims=2,3
    Gen->>Gen: For each size, axes, dims
    Gen-->>Test: Return (size, axes, dims) tuples

    Test->>Test: Create input tensors (input1, input2)
    Test->>Test: Compute permute_axes from axes

    Test->>Fusion: Define fusion with rank, is_copy_transpose
    Fusion->>Fusion: Define tensors with dynamic rank
    Fusion->>Fusion: Add + Permute + ReLU ops
    alt is_copy_transpose
        Fusion->>Fusion: Apply segment_set(T9) → T10
        Fusion->>Fusion: add_output(T10)
    else view_transpose
        Fusion->>Fusion: add_output(T9)
    end

    opt Validation enabled
        Test->>Eager: Execute eager function
        Eager->>Eager: Add + Transpose + ReLU
        alt is_copy_transpose
            Eager->>Eager: Apply .contiguous()
        end
        Eager-->>Test: Return eager_output
        Test->>Fusion: Validate against eager
    end

    opt Benchmarking enabled
        Test->>Bench: Run nvFuser benchmark
        Bench->>Fusion: Execute fusion
    end

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

benchmarks/python/test_transpose.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

naoyam

LGTM

liqiangxl · 2026-02-03T17:19:57Z

!test

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Priya2698 · 2026-02-03T18:02:37Z

benchmarks/python/test_transpose.py

+@pytest.mark.parametrize("size,axes,dims", _generate_transpose_params())
 @pytest.mark.parametrize("dtype", FLOAT_DTYPES)
-@pytest.mark.parametrize("axes", [(0, 1), (0, 2), (1, 2)])
+@pytest.mark.parametrize(


Do we need to benchmark view transpose? Should we remove it instead?

I don't know, it's not an expensive benchmark, so I just leave it as is in this PR.

Got it. Please work with @xwang233 for dashboard integration.

Priya2698 · 2026-02-03T18:03:55Z

benchmarks/python/test_transpose.py


 @pytest.mark.parametrize("executor", DEFAULT_EXECUTORS)
-@pytest.mark.parametrize("size", generate_input_sizes(dims=3))
+@pytest.mark.parametrize("size,axes,dims", _generate_transpose_params())


IIRC, I used 3D inputs to match C++ benchmark. If 2D inputs are sufficient for benchmarking, we should remove the 3D benchmarking. This should also simplify the dashboard for this benchmark

should keep 3D for different axes

benchmarks/python/test_transpose.py

Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>

greptile-apps

_{1 file reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

benchmarks/python/test_transpose.py

greptile-apps · 2026-02-03T19:30:16Z

benchmarks/python/test_transpose.py

+    # add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler.
+    #we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used.


Break these long comments into multiple lines for better readability

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

Priya2698

LGTM. If you find that the view transpose variant is not meaningful anymore, please remove it in a future follow-up.

liqiangxl · 2026-02-04T13:45:26Z

!build

liqiangxl · 2026-02-04T13:46:33Z

LGTM. If you find that the view transpose variant is not meaningful anymore, please remove it in a future follow-up.

It provides apple-to-apple compare and ensure nvFuser is smart enough to detect and use view transpose.

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

liqiangxl added 2 commits February 3, 2026 07:08

add 2d inputs and copy transpose to transpose benchmark

2de4a10

clean

180c9b3

clean

56de94a

liqiangxl marked this pull request as ready for review February 3, 2026 15:26

liqiangxl requested review from Priya2698 and naoyam February 3, 2026 15:26

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

benchmarks/python/test_transpose.py Outdated Show resolved Hide resolved

Apply suggestions from code review

1ceac27

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

naoyam approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into llu/extend_transpose_benchmark

241d692

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Priya2698 requested changes Feb 3, 2026

View reviewed changes

Priya2698 reviewed Feb 3, 2026

View reviewed changes

benchmarks/python/test_transpose.py Outdated Show resolved Hide resolved

Update benchmarks/python/test_transpose.py

bb136c3

Co-authored-by: Priya Mishra <52657555+Priya2698@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Update benchmarks/python/test_transpose.py

e1dcbe7

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

greptile-apps bot reviewed Feb 3, 2026

View reviewed changes

Priya2698 approved these changes Feb 4, 2026

View reviewed changes

Merge branch 'main' into llu/extend_transpose_benchmark

af22dc7

greptile-apps bot reviewed Feb 4, 2026

View reviewed changes

liqiangxl merged commit 37d40a5 into main Feb 5, 2026
18 checks passed

liqiangxl deleted the llu/extend_transpose_benchmark branch February 5, 2026 13:42

		# add segmenter set to avoid presegment passes setting the output as a view of the input without any data movement. It leads to pointwise instead of transpose scheduler.
		#we can also expose OptimizationPassGuard to python frontend and disable presegmentation passes to enforce output to be contiguous and then transpose scheduler will be used.

add 2d inputs and copy transpose to transpose benchmark #5915

add 2d inputs and copy transpose to transpose benchmark #5915

Conversation

liqiangxl commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

greptile-apps bot commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Feb 3, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liqiangxl Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Priya2698 Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

liqiangxl Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Priya2698 left a comment

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Feb 4, 2026

Uh oh!

liqiangxl commented Feb 4, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

liqiangxl commented Feb 3, 2026 •

edited

Loading

github-actions bot commented Feb 3, 2026 •

edited

Loading

greptile-apps bot commented Feb 3, 2026 •

edited

Loading

Priya2698 Feb 3, 2026 •

edited

Loading