Skip to content

[x86][reg][performance] addsubpd not generated in complex multiplication since LLVM 13Β #58139

@p0nce

Description

@p0nce

Consider the following program that multiplies double complex numbers.

#include <pmmintrin.h>

__m128d _mm_complexmult_pd(__m128d a, __m128d b)
{
    __m128d A;
    A[0] = a[0]*b[0];
    A[1] = a[0]*b[1];

    __m128d B;
    B[0] = a[1]*b[1];
    B[1] = a[1]*b[0];

    return _mm_addsub_pd(A, B);
}

__m128d _mm_complexmult_pd_naive(__m128d a, __m128d b)
{
    __m128d A;
    A[0] = a[0]*b[0] - a[1] * b[1];
    A[1] = a[0]*b[1] + a[1] * b[0];
    return A;
}

The first one use explicit addsubpd, the second one not.
In LLVM 12, _mm_complexmult_pd_naive is faster.
In LLVM 13, _mm_complexmult_pd is faster because addsubpd is not generated anymore unless in simpler cases.

Godbolt: https://cpp.godbolt.org/z/sMdh7eG4s

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions