Skip to content
Discussion options

You must be logged in to vote

When using Visual Studio profiler, the "self CPU" column of the Step10 method is very low.

In this case, I suspect the actual culprit to be devirtualization and branch prediction of virtual method dispatch. Since your program is an assembly simulator, each iteration of your code is calling different virtual instruction. When the loop is expanded, every single invocation is treated as different call site and will be optimized separately. However, if your unrolling ratio is too big, the invocation count of each call may drop below the optimization threshold.

I'd expect to see best performance if the unrolling ratio matches exactly the loop size in your code, which is 10 in the sample. In th…

Replies: 3 comments 6 replies

Comment options

You must be logged in to vote
1 reply
@huoyaoyuan
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
5 replies
@tastynoob
Comment options

@huoyaoyuan
Comment options

Answer selected by tastynoob
@huoyaoyuan
Comment options

@tastynoob
Comment options

@huoyaoyuan
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants