Making this PR to see the diff better #1

alan-j-hu · 2023-03-25T16:51:55Z

No description provided.

…d A520 (llvm#132246) Inefficient SVE codegen occurs on at least two in-order cores, those being Cortex-A510 and Cortex-A520. For example a simple vector add ``` void foo(float a, float b, float dst, unsigned n) { for (unsigned i = 0; i < n; ++i) dst[i] = a[i] + b[i]; } ``` Vectorizes the inner loop into the following interleaved sequence of instructions. ``` add x12, x1, x10 ld1b { z0.b }, p0/z, [x1, x10] add x13, x2, x10 ld1b { z1.b }, p0/z, [x2, x10] ldr z2, [x12, #1, mul vl] ldr z3, [x13, #1, mul vl] dech x11 add x12, x0, x10 fadd z0.s, z1.s, z0.s fadd z1.s, z3.s, z2.s st1b { z0.b }, p0, [x0, x10] addvl x10, x10, llvm#2 str z1, [x12, #1, mul vl] ``` By adjusting the target features to prefer fixed over scalable if the cost is equal we get the following vectorized loop. ``` ldp q0, q3, [x11, #-16] subs x13, x13, llvm#8 ldp q1, q2, [x10, #-16] add x10, x10, llvm#32 add x11, x11, llvm#32 fadd v0.4s, v1.4s, v0.4s fadd v1.4s, v2.4s, v3.4s stp q0, q1, [x12, #-16] add x12, x12, llvm#32 ``` Which is more efficient.

alan-j-hu added 3 commits March 24, 2023 10:57

Patch release 16 for no naked pointers

a57c3f2

Fix missed function that was bound directly to LLVM C counterpart

7d90561

Patch release 15 for no naked pointers

a9c1028

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making this PR to see the diff better #1

Making this PR to see the diff better #1

Uh oh!

alan-j-hu commented Mar 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Making this PR to see the diff better #1

Are you sure you want to change the base?

Making this PR to see the diff better #1

Uh oh!

Conversation

alan-j-hu commented Mar 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants