clang is using the element size in memcpy vector expansions:
#include <string.h>
void foo_64(unsigned long *dst, unsigned long *src)
{
memcpy(dst, src, 16);
}
void foo_8(unsigned char *dst, unsigned char *src)
{
memcpy(dst, src, 16);
}
foo_64:
vsetivli zero, 2, e64, m1, ta, ma
vle64.v v8, (a1)
vse64.v v8, (a0)
ret
foo_8:
vsetivli zero, 16, e8, m1, ta, ma
vle8.v v8, (a1)
vse8.v v8, (a0)
ret
This prevents expansion of the memcpy with 64bit elements on a CPU with vector support but no unaligned vector (eg Spacemit K1). If we always used 8B elements for the copy we wouldn't have this issue. I notice gcc does this.