Skip to content

Conversation

@htyu
Copy link
Collaborator

@htyu htyu commented Oct 31, 2024

The epilog loop created by the loop unroller may not be run if the main unrolled loop covers all original loop iterations, thus pipelining it non-speculatively may not be beneficial. It can also cause some correctness issue when combined with the downstream PTXAS optimizer.

@htyu htyu requested a review from ptillet as a code owner October 31, 2024 16:49
@ThomasRaoux
Copy link
Collaborator

It can also cause some correctness issue when combined with the downstream PTXAS optimizer.

that's concerning. Do you know why?

@htyu
Copy link
Collaborator Author

htyu commented Oct 31, 2024

It can also cause some correctness issue when combined with the downstream PTXAS optimizer.

that's concerning. Do you know why?

Still getting to the bottom. The problem went away with DISABLE_PTXAS_OPT=1.

@ThomasRaoux
Copy link
Collaborator

It can also cause some correctness issue when combined with the downstream PTXAS optimizer.

that's concerning. Do you know why?

Still getting to the bottom. The problem went away with DISABLE_PTXAS_OPT=1.

ah I see, so potentially a ptxas bug :(

@htyu
Copy link
Collaborator Author

htyu commented Oct 31, 2024

It can also cause some correctness issue when combined with the downstream PTXAS optimizer.

that's concerning. Do you know why?

Still getting to the bottom. The problem went away with DISABLE_PTXAS_OPT=1.

ah I see, so potentially a ptxas bug :(

Yeah, I think so. May file a bug to NVIDIA once we are able to provide them a repro. For now I'm disabling the pipelining for epilog loops as it may not be profitable anyways.

@htyu htyu requested a review from bertmaher October 31, 2024 21:39
htyu added a commit to llvm/llvm-project that referenced this pull request Nov 4, 2024
)

There is a need of accessing the resulted epilog loop from the SC loop
unroller. It'd clean and convenient to get that directly from the loop
unroller instead of rescanning the whole function, as discussed in
triton-lang/triton#5027 . I'm changing the
result type of `loopUnrollByFactor` for that.
@htyu htyu merged commit d2b8659 into triton-lang:main Nov 5, 2024
7 checks passed
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
…#114573)

There is a need of accessing the resulted epilog loop from the SC loop
unroller. It'd clean and convenient to get that directly from the loop
unroller instead of rescanning the whole function, as discussed in
triton-lang/triton#5027 . I'm changing the
result type of `loopUnrollByFactor` for that.
Luosuu pushed a commit to Luosuu/triton that referenced this pull request Nov 13, 2024
…triton-lang#5027)

The epilog loop created by the loop unroller may not be run if the main
unrolled loop covers all original loop iterations, thus pipelining it
non-speculatively may not be beneficial. It can also cause some
correctness issue when combined with the downstream PTXAS optimizer.
guacamoleo pushed a commit to guacamoleo/triton that referenced this pull request Nov 14, 2024
…triton-lang#5027)

The epilog loop created by the loop unroller may not be run if the main
unrolled loop covers all original loop iterations, thus pipelining it
non-speculatively may not be beneficial. It can also cause some
correctness issue when combined with the downstream PTXAS optimizer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants