You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[AArch64] Unrolling of loops with vector instructions. (#147420)
This patch permits loops with vector instructions to be unrolled.
Today there is an early exit in `getUnrollingPreferences()` of AArch64
targets if a vector instruction is observed in any of the loop blocks.
This patch fixes that so common loops like this one get a chance to be
unrolled:
void saxpy (float * dst, const float * src, const float a, const int
len) {
float32x4_t * vdst = (float32x4_t *)dst;
float32x4_t * vsrc = (float32x4_t *)src;
float32x4_t vk = vdupq_n_f32(a);
for (int i = 0; i < (len >> 2); i++)
{
vdst[i] = vaddq_f32(vdst[i], vmulq_f32(vsrc[i], vk));
}
}
Auto-vectorized loops are still not unrolled, unless they were not
interleaved when vectorized.
The provided test case shows the enhancement on top of runtime/partial
unrolling, depending on the CPU.
PR: #147420
0 commit comments