Add OpenTelemetry error recording with Activity.AddException polyfill#446
Add OpenTelemetry error recording with Activity.AddException polyfill#446
Conversation
Addresses #429 - telemetry activities were not consistently marked as errors when exceptions occurred. This adds a net8 polyfill for the .NET 10 Activity.AddException API, a SetErrorStatus extension that both sets ActivityStatusCode.Error and records exception events per OTel semantic conventions, and consistent error recording across all activity scopes (jobs, queues, work items, messaging, locks). Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
Pull request overview
This PR improves Foundatio’s OpenTelemetry instrumentation so that activities are consistently marked as errors and (when applicable) record exception details as exception events, addressing issue #429.
Changes:
- Added
Activity.SetErrorStatus(...)extension and aActivity.AddException(...)polyfill for pre-.NET 10 targets. - Updated multiple activity scopes (jobs, queues, message bus, locks) to set
ActivityStatusCode.Errorand record exceptions. - Added telemetry-focused tests validating error status and exception event tagging/stacktrace capture.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Foundatio.Tests/Jobs/JobTests.cs | Adds tests verifying RunContinuousAsync sets activity error status and records exception events. |
| src/Foundatio/Utility/FoundatioDiagnostics.cs | Introduces SetErrorStatus extension and Activity.AddException polyfill (non-.NET10). |
| src/Foundatio/Queues/QueueBase.cs | Records exception + error status when queue stats collection fails. |
| src/Foundatio/Messaging/MessageBusBase.cs | Marks message handling activity as error when subscriber handler throws. |
| src/Foundatio/Lock/CacheLockProvider.cs | Marks lock acquisition activity as error when acquisition fails (non-cancellation) and adjusts cancellation log level. |
| src/Foundatio/Jobs/WorkItemJob/WorkItemJob.cs | Marks work item processing activity as error for multiple failure paths (parse/handler/etc.). |
| src/Foundatio/Jobs/QueueJobBase.cs | Marks dequeue and processing activities as error on exceptions/failed results. |
| src/Foundatio/Jobs/JobWithLockBase.cs | Marks lock acquisition activity as error when GetLockAsync throws. |
| src/Foundatio/Jobs/JobResult.cs | No functional change (file header/BOM adjustment). |
| src/Foundatio/Jobs/IJob.cs | Marks the per-iteration job activity as error when TryRunAsync returns a failure result. |
| src/Foundatio.TestHarness/Jobs/HelloWorldJob.cs | Adds ThrowingJob used by tests to validate inner-exception chain capture. |
Comments suppressed due to low confidence (1)
src/Foundatio/Jobs/QueueJobBase.cs:74
dequeueActivityis now created withusing varoutside thetry, which means it won't be disposed until afterProcessAsynccompletes (sinceRunAsyncawaits it). This changes the span duration from “just dequeue” to “dequeue + full processing”, which can skew telemetry and nesting.
Consider scoping the activity to only the dequeue operation (e.g., keep the using around just DequeueAsync/EnrichDequeueActivity, and dispose it before calling ProcessAsync), while still allowing the catch block to call SetErrorStatus (e.g., via explicit variable + try/catch/finally).
using var dequeueActivity = StartDequeueActivity();
try
{
queueEntry = await _queue.Value.DequeueAsync(linkedCancellationTokenSource.Token).AnyContext();
EnrichDequeueActivity(dequeueActivity, queueEntry);
}
catch (OperationCanceledException)
{
return JobResult.Cancelled;
}
catch (Exception ex)
{
dequeueActivity?.SetErrorStatus(ex, $"Error trying to dequeue message: {ex.Message}");
return JobResult.FromException(ex, $"Error trying to dequeue message: {ex.Message}");
}
return await ProcessAsync(queueEntry, cancellationToken).AnyContext();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The job runner already sets error status on the parent activity for any non-success, non-cancelled JobResult. This call was setting it on Activity.Current before the child activity was created, duplicating work the runner handles. Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Addresses #429 - telemetry activities were not consistently marked as errors when exceptions occurred.
Test plan