Skip to content

[Unroll] Further unrolling vector reductions creates dependency chains #108028

@sjoerdmeijer

Description

@sjoerdmeijer

It looks like that our vectorisation strategy is to have some in-loop reduction/dependencies for a simple reduction like this:

for (int i = 0; i < N; i++) {
    sum += a[i];

Because we generate something like this:

vector.body:
   vecsum1 += a[..]
   vecsum2  = a[..] + a[..]
   vecsum1 += vecsum2
   vecsum2  = a[..] + a[..]
   vecsum1 += vecsum2
end
// adding partial sums

But GCC is generating something more like this:

vector.body:
   vecsum1 += a[i:i+4]
   vecsum2 += a[i+4:i+8]
   vecsum3 += a[i+8:i+12]
   vecsum4 += a[i+12:i+16]
end
// adding partial sums

We have more dependency chains in the loop body, which can slow us down.

Here's an AArch64 code example on compiler explorer: https://godbolt.org/z/v1c6hxfGc

I have disabled the interleaver to have a more concise example, but with interleaving things are very similar.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions