Skip to content

Commit 4f2a6ae

Browse files
LebedevRImemfrob
authored andcommitted
[X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs
While we already model this tuple, the values are divergent from reality, so fix them. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. For store we have: https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0` So pick cost of `4`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110966
1 parent 9070527 commit 4f2a6ae

File tree

3 files changed

+4
-4
lines changed

3 files changed

+4
-4
lines changed

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5094,7 +5094,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
50945094

50955095
{3, MVT::v8i32, 17}, // (load 24i32 and) deinterleave into 3 x 8i32
50965096

5097-
{4, MVT::v2i8, 12}, // (load 8i8 and) deinterleave into 4 x 2i8
5097+
{4, MVT::v2i8, 4}, // (load 8i8 and) deinterleave into 4 x 2i8
50985098
{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
50995099
{4, MVT::v8i8, 20}, // (load 32i8 and) deinterleave into 4 x 8i8
51005100
{4, MVT::v16i8, 39}, // (load 64i8 and) deinterleave into 4 x 16i8
@@ -5144,7 +5144,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
51445144
{3, MVT::v16i8, 11}, // interleave 3 x 16i8 into 48i8 (and store)
51455145
{3, MVT::v32i8, 13}, // interleave 3 x 32i8 into 96i8 (and store)
51465146

5147-
{4, MVT::v2i8, 12}, // interleave 4 x 2i8 into 8i8 (and store)
5147+
{4, MVT::v2i8, 4}, // interleave 4 x 2i8 into 8i8 (and store)
51485148
{4, MVT::v4i8, 9}, // interleave 4 x 4i8 into 16i8 (and store)
51495149
{4, MVT::v8i8, 10}, // interleave 4 x 8i8 into 32i8 (and store)
51505150
{4, MVT::v16i8, 10}, // interleave 4 x 16i8 into 64i8 (and store)

llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ target triple = "x86_64-unknown-linux-gnu"
2626
; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
2727
;
2828
; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
29-
; AVX2: LV: Found an estimated cost of 13 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
29+
; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1
3030
; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
3131
; AVX2: LV: Found an estimated cost of 21 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
3232
; AVX2: LV: Found an estimated cost of 41 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1

llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ target triple = "x86_64-unknown-linux-gnu"
2626
; AVX1: LV: Found an estimated cost of 332 for VF 32 For instruction: store i8 %v3, i8* %out3, align 1
2727
;
2828
; AVX2: LV: Found an estimated cost of 1 for VF 1 For instruction: store i8 %v3, i8* %out3, align 1
29-
; AVX2: LV: Found an estimated cost of 13 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1
29+
; AVX2: LV: Found an estimated cost of 5 for VF 2 For instruction: store i8 %v3, i8* %out3, align 1
3030
; AVX2: LV: Found an estimated cost of 10 for VF 4 For instruction: store i8 %v3, i8* %out3, align 1
3131
; AVX2: LV: Found an estimated cost of 11 for VF 8 For instruction: store i8 %v3, i8* %out3, align 1
3232
; AVX2: LV: Found an estimated cost of 12 for VF 16 For instruction: store i8 %v3, i8* %out3, align 1

0 commit comments

Comments
 (0)