Skip to content

avx register spill generates 20 extra instructions #66837

@fbarchard

Description

@fbarchard

Changing instruction order from:

const __m128i va0 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a0));
const __m256i vxa0 = _mm256_cvtepi8_epi16(va0);
a0 += 8;
const __m128i va1 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a1));
const __m256i vxa1 = _mm256_cvtepi8_epi16(va1);
a1 += 8;
const __m128i va2 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a2));
const __m256i vxa2 = _mm256_cvtepi8_epi16(va2);
a2 += 8;

to this:

const __m128i va0 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a0));
const __m128i va1 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a1));
const __m128i va2 = _mm_broadcastq_epi64(_mm_loadl_epi64((const __m128i*) a2));
const __m256i vxa0 = _mm256_cvtepi8_epi16(va0);
a0 += 8;
const __m256i vxa1 = _mm256_cvtepi8_epi16(va1);
a1 += 8;
const __m256i vxa2 = _mm256_cvtepi8_epi16(va2);
a2 += 8;

Causes a microkernel to go from 46 instructions to 60 instructions, due to register spill (of 1 vector)
The generated code generates quite a few vmovdqa to shuffle register order

Attached is preprocessed source

xnn_qd8_f32_qc8w_gemm_minmax_ukernel_3x8c8__avx2.txt

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions