Lower performance when virtual call loop unrolling #8191
-
I have a loop unrolling optimization
it will bring 25% performance improvement
I would like to know how to solve this problem |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
After profiling, I found the loop overhead is still high, more than 20% or even 40% |
Beta Was this translation helpful? Give feedback.
-
First, this is not related to the language. Performance questions should go to https://github.com/dotnet/runtime/discussions .
There's no such type of optimization. Neither C# and JIT compiler will do this transformation. A general rule of optimization is to know what's happening. There are several tools can help you understand how your C# code is compiled:
Loop unrolling is a very low level optimization. It's very uncommon to see "random" unrolling ratio like 10 or 20, but typically 2's exponents like 4, 8, or 16. |
Beta Was this translation helpful? Give feedback.
-
yes, I tried, you are right, but i still don't understand why it's regress |
Beta Was this translation helpful? Give feedback.
When using Visual Studio profiler, the "self CPU" column of the
Step10
method is very low.In this case, I suspect the actual culprit to be devirtualization and branch prediction of virtual method dispatch. Since your program is an assembly simulator, each iteration of your code is calling different virtual instruction. When the loop is expanded, every single invocation is treated as different call site and will be optimized separately. However, if your unrolling ratio is too big, the invocation count of each call may drop below the optimization threshold.
I'd expect to see best performance if the unrolling ratio matches exactly the loop size in your code, which is 10 in the sample. In th…