Commit 17852de
authored
[NVPTX] Lower LLVM masked vector loads and stores to PTX (#159387)
This backend support will allow the LoadStoreVectorizer, in certain
cases, to fill in gaps when creating load/store vectors and generate
LLVM masked load/stores
(https://llvm.org/docs/LangRef.html#llvm-masked-store-intrinsics). To
accomplish this, changes are separated into two parts. This first part
has the backend lowering and TTI changes, and a follow up PR will have
the LSV generate these intrinsics:
#159388.
In this backend change, Masked Loads get lowered to PTX with `#pragma
"used_bytes_mask" [mask];`
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
And Masked Stores get lowered to PTX using the new sink symbol syntax
(https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st).
# TTI Changes
TTI changes are needed because NVPTX only supports masked loads/stores
with _constant_ masks. `ScalarizeMaskedMemIntrin.cpp` is adjusted to
check that the mask is constant and pass that result into the TTI check.
Behavior shouldn't change for non-NVPTX targets, which do not care
whether the mask is variable or constant when determining legality, but
all TTI files that implement these API need to be updated.
# Masked store lowering implementation details
If the masked stores make it to the NVPTX backend without being
scalarized, they are handled by the following:
* `NVPTXISelLowering.cpp` - Sets up a custom operation action and
handles it in lowerMSTORE. Similar handling to normal store vectors,
except we read the mask and place a sentinel register `$noreg` in each
position where the mask reads as false.
For example,
```
t10: v8i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<0>, Constant:i1<0>, Constant:i1<-1>
t11: ch = masked_store<(store unknown-size into %ir.lsr.iv28, align 32, addrspace 1)> t5:1, t5, t7, undef:i64, t10
->
STV_i32_v8 killed %13:int32regs, $noreg, $noreg, killed %16:int32regs, killed %17:int32regs, $noreg, $noreg, killed %20:int32regs, 0, 0, 1, 8, 0, 32, %4:int64regs, 0, debug-location !18 :: (store unknown-size into %ir.lsr.iv28, align 32, addrspace 1);
```
* `NVPTXInstInfo.td` - changes the definition of store vectors to allow
for a mix of sink symbols and registers.
* `NVPXInstPrinter.h/.cpp` - Handles the `$noreg` case by printing "_".
# Masked load lowering implementation details
Masked loads are routed to normal PTX loads, with one difference: a
`#pragma "used_bytes_mask"` is emitted before the load instruction
(https://docs.nvidia.com/cuda/parallel-thread-execution/#pragma-strings-used-bytes-mask).
To accomplish this, a new operand is added to every NVPTXISD Load type
representing this mask.
* `NVPTXISelLowering.h/.cpp` - Masked loads are converted into normal
NVPTXISD loads with a mask operand in two ways. 1) In type legalization
through replaceLoadVector, which is the normal path, and 2) through
LowerMLOAD, to handle the legal vector types
(v2f16/v2bf16/v2i16/v4i8/v2f32) that will not be type legalized. Both
share the same convertMLOADToLoadWithUsedBytesMask helper. Both default
this operand to UINT32_MAX, representing all bytes on. For the latter,
we need a new `NVPTXISD::MLoadV1` type to represent that edge case
because we cannot put the used bytes mask operand on a generic
LoadSDNode.
* `NVPTXISelDAGToDAG.cpp` - Extract used bytes mask from loads, add them
to created machine instructions.
* `NVPTXInstPrinter.h/.cpp` - Print the pragma when the used bytes mask
isn't all ones.
* `NVPTXForwardParams.cpp`, `NVPTXReplaceImageHandles.cpp` - Update
manual indexing of load operands to account for new operand.
* `NVPTXInsrtInfo.td`, `NVPTXIntrinsics.td` - Add the used bytes mask to
the MI definitions.
* `NVPTXTagInvariantLoads.cpp` - Ensure that masked loads also get
tagged as invariant.
Some generic changes that are needed:
* `LegalizeVectorTypes.cpp` - Ensure flags are preserved when splitting
masked loads.
* `SelectionDAGBuilder.cpp` - Preserve `MD_invariant_load` on masked
load SDNode creation1 parent 1a03673 commit 17852de
File tree
36 files changed
+1195
-106
lines changed- llvm
- include/llvm/Analysis
- lib
- Analysis
- CodeGen/SelectionDAG
- Target
- AArch64
- ARM
- Hexagon
- NVPTX
- MCTargetDesc
- RISCV
- VE
- X86
- Transforms/Scalar
- test/CodeGen
- MIR/NVPTX
- NVPTX
36 files changed
+1195
-106
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
842 | 842 | | |
843 | 843 | | |
844 | 844 | | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
845 | 851 | | |
846 | | - | |
847 | | - | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
848 | 855 | | |
849 | | - | |
850 | | - | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
851 | 859 | | |
852 | 860 | | |
853 | 861 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
309 | 309 | | |
310 | 310 | | |
311 | 311 | | |
312 | | - | |
| 312 | + | |
| 313 | + | |
313 | 314 | | |
314 | 315 | | |
315 | 316 | | |
316 | 317 | | |
317 | | - | |
| 318 | + | |
| 319 | + | |
318 | 320 | | |
319 | 321 | | |
320 | 322 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
468 | 468 | | |
469 | 469 | | |
470 | 470 | | |
471 | | - | |
472 | | - | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
473 | 475 | | |
474 | 476 | | |
475 | 477 | | |
476 | | - | |
477 | | - | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
478 | 482 | | |
479 | 483 | | |
480 | 484 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2465 | 2465 | | |
2466 | 2466 | | |
2467 | 2467 | | |
| 2468 | + | |
2468 | 2469 | | |
2469 | 2470 | | |
2470 | 2471 | | |
| |||
2490 | 2491 | | |
2491 | 2492 | | |
2492 | 2493 | | |
2493 | | - | |
2494 | | - | |
2495 | | - | |
| 2494 | + | |
| 2495 | + | |
2496 | 2496 | | |
2497 | 2497 | | |
2498 | 2498 | | |
| |||
2515 | 2515 | | |
2516 | 2516 | | |
2517 | 2517 | | |
2518 | | - | |
2519 | | - | |
| 2518 | + | |
| 2519 | + | |
2520 | 2520 | | |
2521 | 2521 | | |
2522 | 2522 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5063 | 5063 | | |
5064 | 5064 | | |
5065 | 5065 | | |
| 5066 | + | |
| 5067 | + | |
5066 | 5068 | | |
5067 | 5069 | | |
5068 | 5070 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
323 | 323 | | |
324 | 324 | | |
325 | 325 | | |
326 | | - | |
| 326 | + | |
| 327 | + | |
327 | 328 | | |
328 | 329 | | |
329 | 330 | | |
330 | 331 | | |
331 | | - | |
| 332 | + | |
| 333 | + | |
332 | 334 | | |
333 | 335 | | |
334 | 336 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1125 | 1125 | | |
1126 | 1126 | | |
1127 | 1127 | | |
1128 | | - | |
| 1128 | + | |
| 1129 | + | |
1129 | 1130 | | |
1130 | 1131 | | |
1131 | 1132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
195 | 199 | | |
196 | 200 | | |
197 | 201 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
346 | | - | |
| 346 | + | |
| 347 | + | |
347 | 348 | | |
348 | 349 | | |
349 | 350 | | |
350 | 351 | | |
351 | 352 | | |
352 | 353 | | |
353 | | - | |
| 354 | + | |
| 355 | + | |
354 | 356 | | |
355 | 357 | | |
356 | 358 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
169 | | - | |
170 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
171 | 172 | | |
172 | 173 | | |
173 | 174 | | |
| |||
0 commit comments