Skip to content

Add OpenTelemetry error recording with Activity.AddException polyfill#446

Merged
niemyjski merged 5 commits intomainfrom
feature/telemetry-errors
Feb 6, 2026
Merged

Add OpenTelemetry error recording with Activity.AddException polyfill#446
niemyjski merged 5 commits intomainfrom
feature/telemetry-errors

Conversation

@niemyjski
Copy link
Member

Summary

Addresses #429 - telemetry activities were not consistently marked as errors when exceptions occurred.

  • *Polyfill \Activity.AddException* for net8 that matches the .NET 10 API signature and semantics (internal class, won't leak). On .NET 10+ the built-in method takes over automatically via conditional compilation.
  • \SetErrorStatus\ extension that both sets \ActivityStatusCode.Error\ with a descriptive message AND records the exception as an \ActivityEvent\ with \�xception.type, \�xception.message, and \�xception.stacktrace\ tags per OpenTelemetry semantic conventions.
  • Uses \�xception.ToString()\ for the stacktrace tag to capture the full exception chain including inner exceptions.
  • Consistent error recording across all activity scopes:
    • Job runner (\RunContinuousAsync) records errors from \JobResult\ including the exception when present
    • \QueueJobBase\ records errors on dequeue, process, and catch activities
    • \WorkItemJob\ records errors on deserialization failures, missing handlers, handler failures, and unhandled exceptions
    • \MessageBusBase\ records errors on subscriber handler failures
    • \JobWithLockBase\ records errors when lock acquisition throws
    • \QueueBase\ records errors when queue stats collection fails
    • \CacheLockProvider\ records errors when lock acquisition fails (non-cancellation)
  • Lock cancellation log level changed from Trace to Debug for better visibility
  • Tests follow 3-part naming convention with AAA pattern, using a single \ThrowingJob\ that validates inner exception chain capture

Test plan

  • All 1783 existing tests pass
  • New telemetry tests validate error status, exception event tags, and inner exception stacktrace capture
  • Builds clean on both net8.0 and net10.0 targets with 0 warnings

Addresses #429 - telemetry activities were not consistently marked as errors when exceptions occurred. This adds a net8 polyfill for the .NET 10 Activity.AddException API, a SetErrorStatus extension that both sets ActivityStatusCode.Error and records exception events per OTel semantic conventions, and consistent error recording across all activity scopes (jobs, queues, work items, messaging, locks).

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Foundatio’s OpenTelemetry instrumentation so that activities are consistently marked as errors and (when applicable) record exception details as exception events, addressing issue #429.

Changes:

  • Added Activity.SetErrorStatus(...) extension and a Activity.AddException(...) polyfill for pre-.NET 10 targets.
  • Updated multiple activity scopes (jobs, queues, message bus, locks) to set ActivityStatusCode.Error and record exceptions.
  • Added telemetry-focused tests validating error status and exception event tagging/stacktrace capture.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/Foundatio.Tests/Jobs/JobTests.cs Adds tests verifying RunContinuousAsync sets activity error status and records exception events.
src/Foundatio/Utility/FoundatioDiagnostics.cs Introduces SetErrorStatus extension and Activity.AddException polyfill (non-.NET10).
src/Foundatio/Queues/QueueBase.cs Records exception + error status when queue stats collection fails.
src/Foundatio/Messaging/MessageBusBase.cs Marks message handling activity as error when subscriber handler throws.
src/Foundatio/Lock/CacheLockProvider.cs Marks lock acquisition activity as error when acquisition fails (non-cancellation) and adjusts cancellation log level.
src/Foundatio/Jobs/WorkItemJob/WorkItemJob.cs Marks work item processing activity as error for multiple failure paths (parse/handler/etc.).
src/Foundatio/Jobs/QueueJobBase.cs Marks dequeue and processing activities as error on exceptions/failed results.
src/Foundatio/Jobs/JobWithLockBase.cs Marks lock acquisition activity as error when GetLockAsync throws.
src/Foundatio/Jobs/JobResult.cs No functional change (file header/BOM adjustment).
src/Foundatio/Jobs/IJob.cs Marks the per-iteration job activity as error when TryRunAsync returns a failure result.
src/Foundatio.TestHarness/Jobs/HelloWorldJob.cs Adds ThrowingJob used by tests to validate inner-exception chain capture.
Comments suppressed due to low confidence (1)

src/Foundatio/Jobs/QueueJobBase.cs:74

  • dequeueActivity is now created with using var outside the try, which means it won't be disposed until after ProcessAsync completes (since RunAsync awaits it). This changes the span duration from “just dequeue” to “dequeue + full processing”, which can skew telemetry and nesting.

Consider scoping the activity to only the dequeue operation (e.g., keep the using around just DequeueAsync/EnrichDequeueActivity, and dispose it before calling ProcessAsync), while still allowing the catch block to call SetErrorStatus (e.g., via explicit variable + try/catch/finally).

        using var dequeueActivity = StartDequeueActivity();
        try
        {
            queueEntry = await _queue.Value.DequeueAsync(linkedCancellationTokenSource.Token).AnyContext();
            EnrichDequeueActivity(dequeueActivity, queueEntry);
        }
        catch (OperationCanceledException)
        {
            return JobResult.Cancelled;
        }
        catch (Exception ex)
        {
            dequeueActivity?.SetErrorStatus(ex, $"Error trying to dequeue message: {ex.Message}");
            return JobResult.FromException(ex, $"Error trying to dequeue message: {ex.Message}");
        }

        return await ProcessAsync(queueEntry, cancellationToken).AnyContext();

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The job runner already sets error status on the parent activity for any non-success, non-cancelled JobResult. This call was setting it on Activity.Current before the child activity was created, duplicating work the runner handles.

Co-authored-by: Cursor <cursoragent@cursor.com>
@niemyjski niemyjski merged commit c2b7daf into main Feb 6, 2026
4 checks passed
@niemyjski niemyjski deleted the feature/telemetry-errors branch February 6, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants