Skip to content

Conversation

Flash0ver
Copy link
Member

@Flash0ver Flash0ver commented Aug 11, 2025

Summary

Fix a race condition (especially in high volume structured logging scenarios), causing an InvalidOperationException in the underlying CountdownEvent.

Remarks

When testing/dogfooding 5.14.0 in a private project, I noticed that (particularly in high volume logging scenarios) we may run into a InvalidOperationException in the CountdownEvent:

  • decrementing the event's count below zero
  • incrementing the event's count when set/signaled

This change is fixing this thread-safety issue in the control-flows involved in Structured Logging.

Changes

  • add Benchmark EnqueueAndFlush_Parallel that consistently reproduced both error paths
  • add more Debug.Assert invocations, to document assumptions, that would fail unit tests when violated
  • fix by making sure the custom ScopedCountdownLock is reset before disengaging
    • with the other way around, we may end up signalling the underlying CountdownEvent when currently still set

Additional Change

  • now also no longer allowing a new CounterScope to enter when a LockScope is already active (i.e. when the Lock is engaged)
    • previously it was possible to enter a new CounterScope when a LockScope was active, but the counter has not reached 0 just yet ... which could have "dragged out" the actual flush as long as new logs are coming in and the event has not reached 0 yet, for a buffer that is not full yet (so when Flush is triggered through the 5 second timeout)
    • this ensures that when a Flush is forced, e.g. when the Application is terminating unexpectedly, that the Flush-operation can no longer "drag out" as long as new logs come in until the Event actually reaches 0 or the buffer does become full
    • without (notably) pessimizing performance, see updated Benchmark results

@Flash0ver Flash0ver self-assigned this Aug 11, 2025
@Flash0ver Flash0ver added the Logs label Aug 11, 2025
@@ -84,13 +91,13 @@ internal LockScope TryEnterLockScope()

private void ExitLockScope()
{
if (Interlocked.CompareExchange(ref _isEngaged, 0, 1) == 1)
Debug.Assert(_event.IsSet);
_event.Reset(); // reset the signaled event to the initial count of 1, so that new 'CounterScope' instances can be entered again
Copy link
Member Author

@Flash0ver Flash0ver Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: fix

fixed by performing the "Reset before CompareExchange" when existing in the reverse order than the "CompareExchange before Signal" when entering

@@ -139,7 +146,7 @@ internal LockScope(ScopedCountdownLock lockObj)
internal void Wait()
{
var lockObj = _lockObj;
lockObj?._event.Wait();
lockObj?._event.Wait(Timeout.Infinite, CancellationToken.None);
Copy link
Member Author

@Flash0ver Flash0ver Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is not a behavioral change .. just making the behavior more apparent by "lifting" the method call of the parameterless Wait method

@@ -49,6 +49,11 @@ internal ScopedCountdownLock()
/// </remarks>
internal CounterScope TryEnterCounterScope()
{
if (IsEngaged)
Copy link
Member Author

@Flash0ver Flash0ver Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: additional change

no longer allowing to Add new logs to a particular Buffer, while a Flush has been requested (in this case occurring during the 5 second timeout), which may have previously "dragged out" the Flush by still allowing new Add operations before the Event is actually set by reaching a count of 0 (while the buffer is not full yet)

this also effects #4425, where - in the case of a terminating unhandled exception - no more Add operations are admitted and we only wait until all currently in-progress Add operations have concluded in order to do a "safe" Flush, which should complete quite quickly, not "unreasonably" dragging out the shutdown

@Flash0ver Flash0ver marked this pull request as ready for review August 14, 2025 10:52
@Flash0ver
Copy link
Member Author

@sentry review

@bruno-garcia
Copy link
Member

bugbot review

cursor[bot]

This comment was marked as outdated.

@Flash0ver Flash0ver merged commit cbd289b into main Aug 15, 2025
32 checks passed
@Flash0ver Flash0ver deleted the fix/logs-event-lock-scope-race-condition branch August 15, 2025 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants