-
Notifications
You must be signed in to change notification settings - Fork 15.2k
Open
Description
There are some patterns I noticed from my previous customer codebase study:
t1.cpp
void foo(int *A, int *B, int *LoopBound) {
for (int k = 0; k < *LoopBound; k++)
A[k] += B[k];
}t2.cpp
class Base {
protected:
int m_totalPhase;
};
class Derived : public Base {
public:
void foo(int* A, int *B);
};
void Derived::foo(int *A, int *B) {
for (int k = 0; k < m_totalPhase; k++)
A[k] += B[k];
}Loop bound in both cases requires load from pointer that may alias with memory accesses in loop.
clang++ --target=aarch64 -mcpu=cortex-a57 -c -O3 t[1|2].cpp -Rpass-missed=loop-vectorize
t1.cpp:2:3: remark: loop not vectorized [-Rpass-missed=loop-vectorize]
2 | for (int k = 0; k < *LoopBound; k++)
Polly is creating runtime versioning to achieve similar effect like loop versioning. The load of loop bound is hoisted outside, thus vectorization achieved.
clang++ --target=aarch64 -mcpu=cortex-a57 -c -O3 t[1|2].cpp -Rpass=loop-vectorize -mllvm -polly -mllvm -polly-invariant-load-hoisting -mllvm -polly-process-unprofitable
t1.cpp:2:3: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
2 | for (int k = 0; k < *LoopBound; k++)
t2.cpp:12:3: remark: vectorized loop (vectorization width: 4, interleaved count: 4) [-Rpass=loop-vectorize]
12 | for (int k = 0; k < m_totalPhase; k++)
Both cases are good fit for LoopVersioningLICM, where load of loop bound can be hoisted for the versioned loop with no-alias assumption.
I wonder if there are enough interest in community to support vectorization for such loops. I am currently working on different projects, so might not have enough bandwidth to proceed with upstream fixes.