Skip to content

Commit 4668857

Browse files
LebedevRImemfrob
authored andcommitted
[X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs
While we already model this tuple, the load cost is divergent from reality, so fix it. The only sched models that for cpu's that support avx2 but not avx512 are: haswell, broadwell, skylake, zen1-3 For load we have: https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0` So pick cost of `56`. For store we have: https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0` So pick cost of `12`. I'm directly using the shuffling asm the llc produced, without any manual fixups that may be needed to ensure sequential execution. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D110971
1 parent d1cfd93 commit 4668857

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5098,7 +5098,7 @@ InstructionCost X86TTIImpl::getInterleavedMemoryOpCostAVX2(
50985098
{4, MVT::v4i8, 4}, // (load 16i8 and) deinterleave into 4 x 4i8
50995099
{4, MVT::v8i8, 12}, // (load 32i8 and) deinterleave into 4 x 8i8
51005100
{4, MVT::v16i8, 24}, // (load 64i8 and) deinterleave into 4 x 16i8
5101-
{4, MVT::v32i8, 80}, // (load 128i8 and) deinterleave into 4 x 32i8
5101+
{4, MVT::v32i8, 56}, // (load 128i8 and) deinterleave into 4 x 32i8
51025102

51035103
{4, MVT::v2i16, 6}, // (load 8i16 and) deinterleave into 4 x 2i16
51045104
{4, MVT::v4i16, 17}, // (load 16i16 and) deinterleave into 4 x 4i16

llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ target triple = "x86_64-unknown-linux-gnu"
3030
; AVX2: LV: Found an estimated cost of 5 for VF 4 For instruction: %v0 = load i8, i8* %in0, align 1
3131
; AVX2: LV: Found an estimated cost of 13 for VF 8 For instruction: %v0 = load i8, i8* %in0, align 1
3232
; AVX2: LV: Found an estimated cost of 26 for VF 16 For instruction: %v0 = load i8, i8* %in0, align 1
33-
; AVX2: LV: Found an estimated cost of 84 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
33+
; AVX2: LV: Found an estimated cost of 60 for VF 32 For instruction: %v0 = load i8, i8* %in0, align 1
3434
;
3535
; AVX512: LV: Found an estimated cost of 1 for VF 1 For instruction: %v0 = load i8, i8* %in0, align 1
3636
; AVX512: LV: Found an estimated cost of 5 for VF 2 For instruction: %v0 = load i8, i8* %in0, align 1

0 commit comments

Comments
 (0)