Given this code:
https://godbolt.org/z/G55n584nb
uint64_t both(uint32_t* x) {
uint32_t s = 0;
uint32_t m = 0;
for(int i = 0; i < 32; i++) {
s += x[i];
m += x[i] * x[i];
}
return s + ((uint64_t)m << 32);
}
If we remove the s += line then it vectorizes the MLA. If we remove the m += line then it vectorizes the Add. But both together it does not.