Commit 18a3c34
[SLPVectorizer] Use accurate cost for external users of resize shuffles (llvm#137419)
When implementing the vectorization, we potentially need to add shuffles
for external users. In such cases, we may be shuffling a smaller vector
into a larger vector. When this happens `ResizeToVF` will just build a
poison padded identity vector. Then the to build the final shuffle, we
just use the `SK_InsertSubvector` mask.
This is possibly clearer by looking at the included test in
SLPVectorizer/AMDGPU/external-shuffle.ll
In the exit block we have a bunch of shuffles to glue the vectorized
tree match the `InsertElement` users. `TMP25` holds the result of
resizing the v2i16 vectorized sequence to match the `InsertElement` size
v16i16. Then `TMP26` is the final shuffle which replaces the
`InsertElement` sequence. This is just an insertsubvector.
However, when calculating the cost for these shuffles, we aren't
modelling this correctly. `ResizeToVF` will indicate to
`performExtractsShuffleAction` that we cannot use the original mask due
to the resize shuffle. The consequence is that the cost calculation uses
a different shuffle mask than what is ultimately used.
Going back to the included test, we can consider again `TMP26`. Clearly
we can see the shuffle uses a mask {0, 1, 2, 3, 16, 17, poison ..}.
However, we will currently calculate the cost with a mask {0, 1, 2, 3,
20, 21, ...} we have replaced 16 and 17 with 20 and 21 (Index + Vector
Size). Queries like BasicTTImpl::improveShuffleKindFromMask will not
recognize this as an `SK_InsertSubvector` mask, and targets which have
reduced costs for `SK_InsertSubvector` will not accurately calculate the
cost.
(cherry picked from commit c9a87a5)1 parent a9c93da commit 18a3c34
File tree
5 files changed
+101
-120
lines changed- llvm
- lib/Transforms/Vectorize
- test/Transforms/SLPVectorizer
- AMDGPU
- X86
5 files changed
+101
-120
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12799 | 12799 | | |
12800 | 12800 | | |
12801 | 12801 | | |
12802 | | - | |
| 12802 | + | |
12803 | 12803 | | |
12804 | 12804 | | |
12805 | 12805 | | |
12806 | | - | |
12807 | | - | |
12808 | | - | |
12809 | | - | |
12810 | | - | |
12811 | | - | |
12812 | | - | |
12813 | | - | |
12814 | | - | |
12815 | | - | |
12816 | | - | |
12817 | | - | |
12818 | | - | |
12819 | | - | |
12820 | | - | |
| 12806 | + | |
| 12807 | + | |
| 12808 | + | |
| 12809 | + | |
| 12810 | + | |
| 12811 | + | |
| 12812 | + | |
| 12813 | + | |
| 12814 | + | |
| 12815 | + | |
| 12816 | + | |
| 12817 | + | |
| 12818 | + | |
| 12819 | + | |
| 12820 | + | |
| 12821 | + | |
| 12822 | + | |
| 12823 | + | |
| 12824 | + | |
| 12825 | + | |
| 12826 | + | |
| 12827 | + | |
| 12828 | + | |
| 12829 | + | |
| 12830 | + | |
| 12831 | + | |
| 12832 | + | |
| 12833 | + | |
| 12834 | + | |
| 12835 | + | |
| 12836 | + | |
| 12837 | + | |
| 12838 | + | |
| 12839 | + | |
| 12840 | + | |
| 12841 | + | |
| 12842 | + | |
12821 | 12843 | | |
12822 | 12844 | | |
12823 | 12845 | | |
| |||
0 commit comments