Skip to content

AArch64: uses store from GP reg where vectorized reg would be better #137086

@MatzeB

Description

@MatzeB

Repro: https://godbolt.org/z/9bW58ax69

#include <string.h>
#include <arm_neon.h>
#include <arm_sve.h>

void foobar(uint16_t *p, uint8_t *p2, uint64x2_t vec, int x) {
  vec += (uint64x2_t){3, 4};
  if (x) {
    asm volatile(""); // side-effect so conditional move cannot be used.
    vec |= (uint64x2_t){0x12, 0x34};
    *p2 = vec[1];
    *p = vec[0];
  } else {
    // Commenting out the next two instructions will improve code in the "then" branch!
    vec ^= (uint64x2_t){0x34, 0x12};
    *p = vec[0];
  }
}

// clang++ -target aarch64-redhat-linux-gnu -O3 -S -o - test.cpp -march=armv9-a+sve2+fp16

https://godbolt.org/z/41MoofTzP

The compiler ends up compiling the "then" part of the branch as:

...
       umov    w8, v0.h[0]
       st1     { v0.b }[8], [x1]
       strh    w8, [x0]

while this could be done without going throgh the w8 GP register:

       st1     { v0.b }[8], [x1]
       str     h0, [x0]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions