[Code Origin] DEBUG-5199 Reduce per-span allocations and locks#8197
[Code Origin] DEBUG-5199 Reduce per-span allocations and locks#8197dudikeleti wants to merge 13 commits intomasterfrom
Conversation
8411be8 to
9b59bc2
Compare
There was a problem hiding this comment.
Pull request overview
Optimize code-origin tag emission on hot paths by batching tag writes and caching PDB-derived strings to reduce allocations and lock contention.
Changes:
- Add
TagsList.BeginTagBatch()/TagBatchto reserve capacity, take the tag lock once, and append multiple tags. - Update code-origin tagging to use the batch append path when
span.Tagsis aTagsList. - Cache PDB sequence point URL/line/column strings during per-assembly cache construction to avoid per-span
ToString()allocations.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tracer/src/Datadog.Trace/Tagging/TagsList.cs | Adds a lock-holding batch helper to append multiple tags with fewer allocations/locks. |
| tracer/src/Datadog.Trace/Debugger/SpanCodeOrigin/SpanCodeOrigin.cs | Uses batched tag appends for code-origin tags and caches formatted PDB values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8411be87b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
BenchmarksBenchmark execution time: 2026-02-16 11:49:16 Comparing candidate commit 137fb83 in PR branch Found 10 performance improvements and 10 performance regressions! Performance is the same for 160 metrics, 12 unstable metrics. scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest netcoreapp3.1
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSlice net472
scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatBenchmark net6.0
scenario:Benchmarks.Trace.NLogBenchmark.EnrichedLog net472
scenario:Benchmarks.Trace.RedisBenchmark.SendReceive netcoreapp3.1
scenario:Benchmarks.Trace.SingleSpanAspNetCoreBenchmark.SingleSpanAspNetCore net6.0
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishSpan netcoreapp3.1
|
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8197) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (69ms) : 68, 71
master - mean (70ms) : 68, 71
section Bailout
This PR (8197) - mean (73ms) : 72, 75
master - mean (74ms) : 72, 75
section CallTarget+Inlining+NGEN
This PR (8197) - mean (1,059ms) : 967, 1152
master - mean (1,041ms) : 988, 1095
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (115ms) : 111, 119
master - mean (116ms) : 112, 119
section Bailout
This PR (8197) - mean (117ms) : 115, 118
master - mean (117ms) : 115, 119
section CallTarget+Inlining+NGEN
This PR (8197) - mean (781ms) : 741, 821
master - mean (779ms) : 723, 835
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (102ms) : 99, 104
master - mean (102ms) : 99, 105
section Bailout
This PR (8197) - mean (104ms) : 101, 106
master - mean (103ms) : 101, 105
section CallTarget+Inlining+NGEN
This PR (8197) - mean (756ms) : 736, 776
master - mean (770ms) : 746, 793
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (94ms) : 92, 97
master - mean (95ms) : 92, 97
section Bailout
This PR (8197) - mean (95ms) : 94, 97
master - mean (95ms) : 94, 97
section CallTarget+Inlining+NGEN
This PR (8197) - mean (642ms) : 626, 657
master - mean (637ms) : 623, 651
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (211ms) : 202, 220
master - mean (200ms) : 195, 206
section Bailout
This PR (8197) - mean (215ms) : 207, 223
master - mean (204ms) : 199, 210
section CallTarget+Inlining+NGEN
This PR (8197) - mean (1,210ms) : 1144, 1277
master - mean (1,176ms) : 1124, 1228
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (316ms) : 304, 329
master - mean (301ms) : 293, 309
section Bailout
This PR (8197) - mean (316ms) : 308, 325
master - mean (301ms) : 292, 309
section CallTarget+Inlining+NGEN
This PR (8197) - mean (1,039ms) : 986, 1091
master - mean (992ms) : 951, 1033
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (296ms) : 288, 305
master - mean (285ms) : 276, 293
section Bailout
This PR (8197) - mean (297ms) : 289, 305
master - mean (284ms) : 279, 288
section CallTarget+Inlining+NGEN
This PR (8197) - mean (985ms) : 932, 1039
master - mean (956ms) : 900, 1011
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8197) - mean (307ms) : 297, 317
master - mean (292ms) : 284, 300
section Bailout
This PR (8197) - mean (310ms) : crit, 301, 319
master - mean (293ms) : 286, 300
section CallTarget+Inlining+NGEN
This PR (8197) - mean (995ms) : crit, 888, 1103
master - mean (901ms) : 842, 959
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I see the benchmarks above, but are we able to introduce a unittest that provably shows the reduction in allocations? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| internal readonly struct TagBatch : IDisposable | ||
| { | ||
| private readonly List<KeyValuePair<string, string>>? _tags; | ||
|
|
||
| internal TagBatch(List<KeyValuePair<string, string>> tags) | ||
| { | ||
| _tags = tags; | ||
| } |
There was a problem hiding this comment.
TagBatch is a lock-holding struct, but as a normal (readonly) struct it can be copied accidentally (e.g., passed by value), which can lead to double-dispose or disposing a copy and throwing SynchronizationLockException when Monitor.Exit is called incorrectly. Consider making TagBatch a ref struct (and rely on pattern-based Dispose) or otherwise redesigning it to be non-copyable so misuse is prevented by the type system rather than only by comments.
There was a problem hiding this comment.
This does concern me... it's really hard to reason about exactly when these copies might or might not be added
Since we last spoke, I've realised that we can have some simpler alternatives. IN particular, we can create multiple SetTags methods, which side-steps the allocation issue, at the expense of some minor inconvenience/duplication implementation wise 🙂
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2);
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2, KeyValuePair<string, string> tag3);
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2, KeyValuePair<string, string> tag3, KeyValuePair<string, string> tag4);
// I don't know if it's worth adding this too - it's much nicer to use and implement
// but can't work with .NET FX/.NET Standard, so may never be used in practice 🤷♂️
#if NETCOREAPP
public void SetTags(params ReadOnlySpan<KeyValuePair<string, string>> tags);
#endifInternally, these methods would do the same thing as the existing SetTag, but would ensure the capacity is sufficient first
| [Fact] | ||
| public void EntryCodeOriginTagging_ShouldReduceAllocations() | ||
| { | ||
| // 1) the optimized approach allocates less than the baseline | ||
| // 2) the optimized approach stays below a fixed budget to catch regressions | ||
| // | ||
| // This test focuses on allocations only (not CPU). | ||
| const int iterations = 10_000; | ||
| const int rounds = 3; | ||
| const double maxOptimizedRatio = 0.90; | ||
| const long maxOptimizedBytesPerOp = 300; |
There was a problem hiding this comment.
The allocation test includes a hard absolute budget (maxOptimizedBytesPerOp = 300) and a fixed ratio threshold, which can be brittle across runtime versions, CPU architectures, and build configurations (Debug/Release), leading to flaky CI. To reduce flakiness while still catching regressions, consider removing/relaxing the absolute byte cap and basing the assertion only on relative improvement vs. the baseline (or gating the absolute check behind an explicit opt-in like an environment variable / dedicated performance test run).
| [Tag(Trace.Tags.CodeOriginType)] | ||
| public string CodeOriginType { get; set; } | ||
|
|
||
| [Tag(Trace.Tags.CodeOriginFrameIndex)] | ||
| public string CodeOriginFrames0Index { get; set; } | ||
|
|
||
| [Tag(Trace.Tags.CodeOriginFrameMethod)] | ||
| public string CodeOriginFrames0Method { get; set; } | ||
|
|
||
| [Tag(Trace.Tags.CodeOriginFrameType)] | ||
| public string CodeOriginFrames0Type { get; set; } |
There was a problem hiding this comment.
I don't think we should add these to the base WebTags type, as that increases the allocations of all derived types, whereas we will only ever actually add these to the root span tag type, which will be a limited subset of those 🤔
Also, we shouldn't do this until after (or ideally, at the same time) as we enable code origins by default 😄
There was a problem hiding this comment.
Also, we shouldn't do this until after (or ideally, at the same time) as we enable code origins by default 😄
Well… now is that time 😉
There was a problem hiding this comment.
It seems that we do add code origin to all derived types. The only one that is still missing is WCF, but I plan to add it soon
We don't have ASP.NET support yet do we? 🤔 Also, we don't need it in AspNetCoreEndpointTags or AspNetCoreMvcTags as those aren't service-entry spans.
Tbh, the Endpoint and MVC Tags might not need to be WebTags at all 🤔 we should check whether they really need to be or not. I can look into that separately as part of an effort on aspnetcore perf I'm working on atm
Well… now is that time 😉
Fair enough, I really meant in the same PR, but meh 😉
There was a problem hiding this comment.
We don't have ASP.NET support yet do we? 🤔
Also, we don't need it in AspNetCoreEndpointTags or AspNetCoreMvcTags as those aren't service-entry spans.
Got it. Let's start with netcore only in this PR. I'll add the others later.
Fair enough, I really meant in the same PR, but meh 😉
Ah got it.
| internal readonly struct TagBatch : IDisposable | ||
| { | ||
| private readonly List<KeyValuePair<string, string>>? _tags; | ||
|
|
||
| internal TagBatch(List<KeyValuePair<string, string>> tags) | ||
| { | ||
| _tags = tags; | ||
| } |
There was a problem hiding this comment.
This does concern me... it's really hard to reason about exactly when these copies might or might not be added
Since we last spoke, I've realised that we can have some simpler alternatives. IN particular, we can create multiple SetTags methods, which side-steps the allocation issue, at the expense of some minor inconvenience/duplication implementation wise 🙂
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2);
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2, KeyValuePair<string, string> tag3);
public void SetTags(KeyValuePair<string, string> tag1, KeyValuePair<string, string> tag2, KeyValuePair<string, string> tag3, KeyValuePair<string, string> tag4);
// I don't know if it's worth adding this too - it's much nicer to use and implement
// but can't work with .NET FX/.NET Standard, so may never be used in practice 🤷♂️
#if NETCOREAPP
public void SetTags(params ReadOnlySpan<KeyValuePair<string, string>> tags);
#endifInternally, these methods would do the same thing as the existing SetTag, but would ensure the capacity is sufficient first
09dcf7f to
58f2a67
Compare
Make TagBatch.Dispose() a no-op when uninitialized
…Tag helper in TagsList that reserves capacity and preserves replace/remove semantics while setting multiple tags.
137fb83 to
7ce0004
Compare

Summary of changes
_dd.code_origin.*) by batching tag appends inTagsList.TagsList.BeginTagBatch()/TagBatch.SetTag()helper to reserve capacity, take the tag-list lock once, and set multiple tags with the same override semantics asSetTagToString()allocations.Reason for change
ToString()and repeated tag container growth/locking).Implementation details
TagBatchpreservesTagsList.SetTag()semantics. Callers must keep the critical section small and avoid slow/allocating work while holding the lock.Test coverage
SpanCodeOriginTestsEndpointDetectorTests