-
-
Notifications
You must be signed in to change notification settings - Fork 220
fix(logs): race condition in high volume logging scenarios #4428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…entered successfully
@@ -84,13 +91,13 @@ internal LockScope TryEnterLockScope() | |||
|
|||
private void ExitLockScope() | |||
{ | |||
if (Interlocked.CompareExchange(ref _isEngaged, 0, 1) == 1) | |||
Debug.Assert(_event.IsSet); | |||
_event.Reset(); // reset the signaled event to the initial count of 1, so that new 'CounterScope' instances can be entered again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: fix
fixed by performing the "Reset before CompareExchange" when existing in the reverse order than the "CompareExchange before Signal" when entering
@@ -139,7 +146,7 @@ internal LockScope(ScopedCountdownLock lockObj) | |||
internal void Wait() | |||
{ | |||
var lockObj = _lockObj; | |||
lockObj?._event.Wait(); | |||
lockObj?._event.Wait(Timeout.Infinite, CancellationToken.None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: this is not a behavioral change .. just making the behavior more apparent by "lifting" the method call of the parameterless Wait
method
@@ -49,6 +49,11 @@ internal ScopedCountdownLock() | |||
/// </remarks> | |||
internal CounterScope TryEnterCounterScope() | |||
{ | |||
if (IsEngaged) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: additional change
no longer allowing to Add
new logs to a particular Buffer, while a Flush
has been requested (in this case occurring during the 5
second timeout), which may have previously "dragged out" the Flush by still allowing new Add
operations before the Event is actually set by reaching a count of 0
(while the buffer is not full yet)
this also effects #4425, where - in the case of a terminating unhandled exception - no more Add
operations are admitted and we only wait until all currently in-progress Add
operations have concluded in order to do a "safe" Flush
, which should complete quite quickly, not "unreasonably" dragging out the shutdown
@sentry review |
bugbot review |
Summary
Fix a race condition (especially in high volume structured logging scenarios), causing an
InvalidOperationException
in the underlyingCountdownEvent
.Remarks
When testing/dogfooding 5.14.0 in a private project, I noticed that (particularly in high volume logging scenarios) we may run into a
InvalidOperationException
in theCountdownEvent
:This change is fixing this thread-safety issue in the control-flows involved in Structured Logging.
Changes
EnqueueAndFlush_Parallel
that consistently reproduced both error pathsDebug.Assert
invocations, to document assumptions, that would fail unit tests when violatedScopedCountdownLock
is reset before disengagingCountdownEvent
when currently still setAdditional Change
CounterScope
to enter when aLockScope
is already active (i.e. when the Lock is engaged)CounterScope
when aLockScope
was active, but the counter has not reached0
just yet ... which could have "dragged out" the actual flush as long as new logs are coming in and the event has not reached0
yet, for a buffer that is not full yet (so when Flush is triggered through the5
second timeout)Flush
is forced, e.g. when the Application is terminating unexpectedly, that the Flush-operation can no longer "drag out" as long as new logs come in until the Event actually reaches0
or the buffer does become full