Skip to content

Commit 36fece6

Browse files
authored
[Windows] Fix '__builtin_clz' on windows (#3312)
Closes #3273 This recent PR in upstream (triton-lang/triton#5621) brought a new faster logic for `pext_i32` that is used in `ReduceOpToLLVM` pattern. The new logic of `pext_i32` uses `__builtin_clz` intrinsic, that is natively available in GCC and Clang, but is missing in MSVC. It seems that the Windows version of this intrinsic was incorrectly copied from [the given source](https://gist.github.com/pps83/3210a2f980fd02bb2ba2e5a1fc4a2ef0#file-ctz_clz-cpp-L44-L55), so that it misses `r ^ 31` at the end of it, causing `tt.reduce(...)` lowering to produce incorrect llvm IR in some scenarious. Signed-off-by: dchigarev <[email protected]>
1 parent 4a99671 commit 36fece6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

lib/Conversion/TritonGPUToLLVM/Utility.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
static int __builtin_clz(unsigned x) {
1616
unsigned long r;
1717
_BitScanReverse(&r, x);
18-
return static_cast<int>(r);
18+
return static_cast<int>(r ^ 31);
1919
}
2020

2121
static int __builtin_ctz(unsigned x) {

0 commit comments

Comments
 (0)