Skip to content

missing simplification for llvm.vector.reduce.and on vector of i1 #50603

@zygoloid

Description

@zygoloid
mannequin
Bugzilla Link 51259
Version trunk
OS All
CC @topperc,@LebedevRI,@RKSimon,@phoebewang,@rotateright

Extended Description

Live demo: https://godbolt.org/z/a6YaahdPP

Example:

[[gnu::weak]] void do_this() {}
[[gnu::weak]] void do_that() {}

void f1(unsigned char const p[8]) {
  if (p[0] != 0x00 & p[1] != 0x00 & p[2] != 0x00 & p[3] != 0x00 & p[4] != 0x00 &
      p[5] != 0x00 & p[6] != 0x00 & p[7] != 0x00) {
    do_this();
  } else {
    do_that();
  }
}

void f2(unsigned const char *p) {
  using T [[gnu::vector_size(8), gnu::aligned(1)]] = unsigned char;
  T same = *(T *)p == (T){0, 0, 0, 0, 0, 0, 0, 0};
  if ((unsigned long)same == 0) {
    do_this();
  } else {
    do_that();
  }
}

This results in the following:

f1(unsigned char const*):                               # @f1(unsigned char const*)
        vmovq   xmm0, qword ptr [rdi]           # xmm0 = mem[0],zero
        vpxor   xmm1, xmm1, xmm1
        vpcmpeqb        xmm0, xmm0, xmm1
        vpmovmskb       eax, xmm0
        not     eax
        cmp     al, -1
        jne     ...

f2(unsigned char const*):                               # @f2(unsigned char const*)
        vmovq   xmm0, qword ptr [rdi]           # xmm0 = mem[0],zero
        vpxor   xmm1, xmm1, xmm1
        vpcmpeqb        xmm0, xmm0, xmm1
        vmovq   rax, xmm0
        test    rax, rax
        je      ...

I think these should produce the same assembly, and the result from f2 looks better to me (though both are the same size). Presumably we'd need to recognize that after vpcmpeqb, each lane in xmm0 is either all-zeros or all-ones, so the vpmovmskb is redundant.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions