fix(logs): race condition in high volume logging scenarios #4428

Flash0ver · 2025-08-11T11:59:56Z

Summary

Fix a race condition (especially in high volume structured logging scenarios), causing an InvalidOperationException in the underlying CountdownEvent.

Remarks

When testing/dogfooding 5.14.0 in a private project, I noticed that (particularly in high volume logging scenarios) we may run into a InvalidOperationException in the CountdownEvent:

decrementing the event's count below zero
incrementing the event's count when set/signaled

This change is fixing this thread-safety issue in the control-flows involved in Structured Logging.

Changes

add Benchmark EnqueueAndFlush_Parallel that consistently reproduced both error paths
add more Debug.Assert invocations, to document assumptions, that would fail unit tests when violated
fix by making sure the custom ScopedCountdownLock is reset before disengaging
- with the other way around, we may end up signalling the underlying CountdownEvent when currently still set

Additional Change

now also no longer allowing a new CounterScope to enter when a LockScope is already active (i.e. when the Lock is engaged)
- previously it was possible to enter a new CounterScope when a LockScope was active, but the counter has not reached 0 just yet ... which could have "dragged out" the actual flush as long as new logs are coming in and the event has not reached 0 yet, for a buffer that is not full yet (so when Flush is triggered through the 5 second timeout)
- this ensures that when a Flush is forced, e.g. when the Application is terminating unexpectedly, that the Flush-operation can no longer "drag out" as long as new logs come in until the Event actually reaches 0 or the buffer does become full
  - see also fix(logs): flush Logger on UnhandledException that IsTerminating #4425
- without (notably) pessimizing performance, see updated Benchmark results

…entered successfully

Flash0ver · 2025-08-11T12:51:17Z

src/Sentry/Threading/ScopedCountdownLock.cs

@@ -84,13 +91,13 @@ internal LockScope TryEnterLockScope()

    private void ExitLockScope()
    {
-        if (Interlocked.CompareExchange(ref _isEngaged, 0, 1) == 1)
+        Debug.Assert(_event.IsSet);
+        _event.Reset(); // reset the signaled event to the initial count of 1, so that new 'CounterScope' instances can be entered again


note: fix

fixed by performing the "Reset before CompareExchange" when existing in the reverse order than the "CompareExchange before Signal" when entering

Flash0ver · 2025-08-11T12:52:53Z

src/Sentry/Threading/ScopedCountdownLock.cs

@@ -139,7 +146,7 @@ internal LockScope(ScopedCountdownLock lockObj)
        internal void Wait()
        {
            var lockObj = _lockObj;
-            lockObj?._event.Wait();
+            lockObj?._event.Wait(Timeout.Infinite, CancellationToken.None);


note: this is not a behavioral change .. just making the behavior more apparent by "lifting" the method call of the parameterless Wait method

Flash0ver · 2025-08-11T12:55:45Z

src/Sentry/Threading/ScopedCountdownLock.cs

@@ -49,6 +49,11 @@ internal ScopedCountdownLock()
    /// </remarks>
    internal CounterScope TryEnterCounterScope()
    {
+        if (IsEngaged)


note: additional change

no longer allowing to Add new logs to a particular Buffer, while a Flush has been requested (in this case occurring during the 5 second timeout), which may have previously "dragged out" the Flush by still allowing new Add operations before the Event is actually set by reaching a count of 0 (while the buffer is not full yet)

this also effects #4425, where - in the case of a terminating unhandled exception - no more Add operations are admitted and we only wait until all currently in-progress Add operations have concluded in order to do a "safe" Flush, which should complete quite quickly, not "unreasonably" dragging out the shutdown

Flash0ver · 2025-08-14T10:52:28Z

@sentry review

src/Sentry/Threading/ScopedCountdownLock.cs

bruno-garcia · 2025-08-14T14:04:57Z

bugbot review

Flash0ver added 8 commits August 10, 2025 19:51

perf: remove NSubstitute from benchmark for Structured Logs

a4d2607

fix(logs): race condition when flushing Batch-Buffer

17bbed7

ref(logs): pass arguments explicitly to Wait method

089fffc

docs(logs): more precise comments

34c992f

ref(logs): more Debug.Assert

3608899

ref(logs): no longer enter Counter-Scopes when a Lock-Scope has been …

0c75df7

…entered successfully

perf: add Parallel benchmark for Structured Logs

0b7c248

Merge branch 'main' into fix/logs-event-lock-scope-race-condition

43d9772

Flash0ver self-assigned this Aug 11, 2025

Flash0ver added the Logs label Aug 11, 2025

Flash0ver added 2 commits August 11, 2025 14:40

perf(logs): restore previous performance characteristics

19ba13a

docs: CHANGELOG

f6265b0

Flash0ver commented Aug 11, 2025

View reviewed changes

Flash0ver mentioned this pull request Aug 14, 2025

fix(logs): flush Logger on UnhandledException that IsTerminating #4425

Merged

Flash0ver added 4 commits August 14, 2025 11:47

perf(logs): update Benchmark result

98b0379

Merge branch 'main' into fix/logs-event-lock-scope-race-condition

186fe1e

merge(logs): CHANGELOG

2608bf4

docs(logs): rephrase CHANGELOG

21b4f71

Flash0ver marked this pull request as ready for review August 14, 2025 10:52

Flash0ver requested a review from jamescrosswell as a code owner August 14, 2025 10:52

seer-by-sentry bot reviewed Aug 14, 2025

View reviewed changes

src/Sentry/Threading/ScopedCountdownLock.cs Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into fix/logs-event-lock-scope-race-condition

9e10df7

jamescrosswell approved these changes Aug 15, 2025

View reviewed changes

docs: rephrase CHANGELOG

b1d5441

Flash0ver merged commit cbd289b into main Aug 15, 2025
32 checks passed

Flash0ver deleted the fix/logs-event-lock-scope-race-condition branch August 15, 2025 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(logs): race condition in high volume logging scenarios #4428

fix(logs): race condition in high volume logging scenarios #4428

Uh oh!

Flash0ver commented Aug 11, 2025 •

edited

Loading

Uh oh!

Flash0ver Aug 11, 2025 •

edited

Loading

Uh oh!

Flash0ver Aug 11, 2025 •

edited

Loading

Uh oh!

Flash0ver Aug 11, 2025 •

edited

Loading

Uh oh!

Flash0ver commented Aug 14, 2025

Uh oh!

Uh oh!

bruno-garcia commented Aug 14, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix(logs): race condition in high volume logging scenarios #4428

fix(logs): race condition in high volume logging scenarios #4428

Uh oh!

Conversation

Flash0ver commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Remarks

Changes

Additional Change

Uh oh!

Flash0ver Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Flash0ver Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Flash0ver Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Flash0ver commented Aug 14, 2025

Uh oh!

Uh oh!

bruno-garcia commented Aug 14, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Flash0ver commented Aug 11, 2025 •

edited

Loading

Flash0ver Aug 11, 2025 •

edited

Loading

Flash0ver Aug 11, 2025 •

edited

Loading

Flash0ver Aug 11, 2025 •

edited

Loading