Commit f7e6775
authored
[AMD] Pass down atomics memscope through lowering (triton-lang#5580)
# Overview
Atomics in triton have two optional attributes:
1) `sem` -- describing the memory semantics of the operation
2) `scope` -- describing which threads will see the effect of a memory
operation (e.g., GPU, CTA)
Presently, the `scope` is ignored by the AMD backend and defaults to
`agent`-scope in the emitted LLVM (which roughly corresponds to `gpu`
memscope in triton). This is correct (in most cases? maybe not all?), as
this is a "stricter" scope than CTA (and I'm guessing it is rare that
system scope is needed for AMD kernels, so no bugs have shown up). That
being said, emitting atomics at CTA scope can be more efficient since
there can be fewer cache invalidations/barriers.
I think that this is fixable by just passing through the attribute to
the generated `llvm.atomicrmw` op. There are some additional
optimizations potentially possible (e.g., !amdgpu.no.remote.memory,
since Triton doesn't support this today), but it isn't clear to me if
those would have any real impact on end-to-end performance and those
optimizations would be specific to the `sys`-scope that doesn't appear
to be frequently used.
# Testing
I added a lit test to ensure that the generated LLVM instructions have
the correct sem/scope attributes for atomicrmw, but I also ran the
following 386 unit tests locally on an MI300x:
```bash
pytest test/unit/language/test_core.py -k test_atomic_
```
I then locally ran some kernels with the scope set to CTA/SYSTEM to make
sure that they worked.1 parent a3095b3 commit f7e6775
File tree
2 files changed
+81
-6
lines changed- test/Conversion/amd
- third_party/amd/lib/TritonAMDGPUToLLVM
2 files changed
+81
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
209 | 209 | | |
210 | 210 | | |
211 | 211 | | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
Lines changed: 38 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
227 | 227 | | |
228 | 228 | | |
229 | 229 | | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
230 | 253 | | |
231 | 254 | | |
232 | 255 | | |
| |||
601 | 624 | | |
602 | 625 | | |
603 | 626 | | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
604 | 631 | | |
605 | 632 | | |
606 | 633 | | |
| |||
643 | 670 | | |
644 | 671 | | |
645 | 672 | | |
646 | | - | |
| 673 | + | |
647 | 674 | | |
648 | 675 | | |
649 | 676 | | |
| |||
852 | 879 | | |
853 | 880 | | |
854 | 881 | | |
| 882 | + | |
855 | 883 | | |
856 | 884 | | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
857 | 889 | | |
858 | 890 | | |
859 | 891 | | |
| |||
907 | 939 | | |
908 | 940 | | |
909 | 941 | | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | | - | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
915 | 947 | | |
916 | 948 | | |
917 | 949 | | |
| |||
0 commit comments