Skip to content

Eliminate unnecessary AsyncLocal allocation if SessionId isn't changed #1453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 26, 2017

Conversation

hazzik
Copy link
Member

@hazzik hazzik commented Nov 24, 2017

This might fix the unnecessary allocations~, but I've never checked that~.

@hazzik
Copy link
Member Author

hazzik commented Nov 24, 2017

After running a profiler it seems that this solution isn't viable. At the current state the performance improvement is only 1.53%

@hazzik
Copy link
Member Author

hazzik commented Nov 24, 2017

I turns out that CallContext.LogicalSetData/CallContext.LogicalGetData is much more performant than AsyncLocal<T>. Now the performance improvement is 18% compared to the original 5.0.1.

@fredericDelaporte @gliljas @maca88 could you please double check?


namespace NHibernate.Impl
{
public class SessionIdLoggingContext : IDisposable
{
private static readonly AsyncLocal<Guid?> _currentSessionId = new AsyncLocal<Guid?>();

#if NETSTANDARD2_0 || NETCOREAPP2_0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.net standard and .net core do not support CallContext

Copy link
Member

@fredericDelaporte fredericDelaporte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that CallContext was not suitable for async, but it appears its Logical methods are ok. (According to this, not only they were suitable for our "linear" usage case (no tasks launched in parallel on same session), but since .Net 4.5 they are also suitable in any async usage case.)

So the main points to consider are in my view:

  • Whether we accept introducing those target framework directives for this case, or if we prefer avoid them as much as possible (so wait for maca solution to be released or for another one).
  • Is it still worth checking the value has changed before setting it? The main overhead seems to be having the context containing some data, which then gets copied at each context switch anyway.
  • If putting this solution in place, does maca solution still provides additional significant savings? If session id is actually responsible of 28.35% of timings as said in #1446, then it looks probable. But if maca solution does not add additional significant savings compared to this one here, then maybe should it be reverted.

@hazzik
Copy link
Member Author

hazzik commented Nov 24, 2017

I want this change to be introduced in a bugfix release (and release it sooner).

Whether we accept introducing those target framework directives for this case, or if we prefer [to] avoid them as much as possible (so wait for maca solution to be released or for another one).

I introduced them beforehand as I know we're working on the .net standard version.

Is it still worth checking the value has changed before setting it?

Yes. We already read the old value to store it for the future, so no extra overhead there. Then it saves us the Dispose call. So in the first version, it 1.8% of performance. Apparently reading the context is most expensive operation.

The main overhead seems to be having the context containing some data, which then gets copied at each context switch anyway.

Yeah, this overhead is still here, but it's much less than the overhead of AsyncLocal<T>. Someone on internet noted that CallContext works 5 times faster than AsyncLocal<T>. And it does not generate memory traffic as opposed to AsyncLocal<T> (at least in that scenario) - the memory consumption is below v4.1 now: 514 269 KB (526 612 072 bytes)

If putting this solution in place, does maca's solution still provides additional significant savings?

Yes, as above - the most expensive operation is reading the context. And we still have it here. As you've noted its 18% (here) vs 28% (when completely disabled). So we need to merge these solutions together.

@maca88
Copy link
Contributor

maca88 commented Nov 24, 2017

@hazzik I did some testing and it's working fine. 👍

@hazzik
Copy link
Member Author

hazzik commented Nov 25, 2017

Hi @StephenCleary I’ve seen your comment on SO about the performance of CallContext vs AsyncLocal. It turns out that in our scenario it adds up a very significant footprint: +300MB memory allocated and a lot of CPU wasted. The CPU is wasted on dictionary lookup for AsyncLocal. The extra allocations are unexpected as we’re using only one thread (no async).

@hazzik hazzik merged commit 15c95da into 5.0.x Nov 26, 2017
@hazzik hazzik deleted the session-id-logging-context-quick-fix branch November 26, 2017 22:04
@fredericDelaporte fredericDelaporte changed the title Trying to eliminate unnecessary AsyncLocal allocation if SessionId isn't changed Eliminate unnecessary AsyncLocal allocation if SessionId isn't changed Nov 28, 2017
@StephenCleary
Copy link

@hazzik From the source code, I just don't see how that's possible.

@hazzik
Copy link
Member Author

hazzik commented Dec 10, 2017

@StephenCleary most of the CPU is wasted in dictionary operations in ExecutionContext. I don't know where the allocations are coming from. But the fact is, that with AsyncLocal<> we got +300 MB allocated and about 25% regression by CPU.

@hazzik
Copy link
Member Author

hazzik commented Dec 10, 2017

Maybe this copying of the dictionary is responsible: https://referencesource.microsoft.com/#mscorlib/system/threading/executioncontext.cs,683. Also, bear in mind, that in that particular case we do not have any context switches, as everything is happening synchronously in a single thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants