Currently memset is unrolled and optimized within the body of the loop instead of being hoisted, which is suboptimal when the unrolling is less than the iteration number: https://godbolt.org/z/dK3a7xjPW.
This also means -fno-unroll-loops causes the memsets to not be optimized at all.
Previously discussed in #143015:
The optimization happens as a result of unrolling, so it is affected by target-dependent heuristics. It would be legal to do it independently of unrolling by hoisting the memset out of the loop, it's just not implemented. It does work for a plain store, implemented here I believe:
|
} else if (auto *SI = dyn_cast<StoreInst>(&I)) { |
It could be extended to the memset case.
Originally posted by @nikic in #143015