-
Notifications
You must be signed in to change notification settings - Fork 151
Improve OpenTelemetryTags population to reduce allocation and work
#8042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| internal static string GetOperationName(OpenTelemetryTags tags) | ||
| { | ||
| var httpRequestMethod = tags.GetTag("http.request.method"); | ||
| if (tags.SpanKind == SpanKinds.Server && !string.IsNullOrEmpty(httpRequestMethod)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each of these, the method was basically inlined, then reused tags we'd already fetched where possible
| // TODO: This is checking that the tag is being set on the _span_ not on the _activity_, is that really correct? | ||
| // If it's correct, we should probably update the SetTag logic and remove this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just highlighting that we haven't yet copied activity tags to the span yet, so there could be a bug here if we're assuming that we have?
| } | ||
|
|
||
| // Map the OTEL status to error tags | ||
| AgentStatus2Error(activity, span); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inlined this method because it makes it simpler in terms of the tags and a couple of other things
| [Tag(Tags.MessagingSystem)] | ||
| public string MessagingSystem { get; set; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed these from OpenTelemetryTags but the service bus integration does actually set them
This comment has been minimized.
This comment has been minimized.
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (8042) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (78ms) : 75, 81
master - mean (74ms) : 72, 77
section Bailout
This PR (8042) - mean (82ms) : 79, 85
master - mean (78ms) : 76, 80
section CallTarget+Inlining+NGEN
This PR (8042) - mean (1,091ms) : 1045, 1136
master - mean (1,048ms) : 996, 1100
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (121ms) : 116, 126
master - mean (115ms) : 111, 118
section Bailout
This PR (8042) - mean (122ms) : 118, 126
master - mean (116ms) : 113, 119
section CallTarget+Inlining+NGEN
This PR (8042) - mean (793ms) : 732, 855
master - mean (773ms) : 715, 832
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (107ms) : 104, 111
master - mean (101ms) : 98, 104
section Bailout
This PR (8042) - mean (108ms) : 105, 111
master - mean (102ms) : 100, 105
section CallTarget+Inlining+NGEN
This PR (8042) - mean (759ms) : 686, 831
master - mean (737ms) : 670, 805
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (106ms) : 103, 109
master - mean (100ms) : 97, 102
section Bailout
This PR (8042) - mean (107ms) : crit, 105, 110
master - mean (101ms) : 99, 103
section CallTarget+Inlining+NGEN
This PR (8042) - mean (708ms) : crit, 687, 729
master - mean (668ms) : 647, 689
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (195ms) : 189, 201
master - mean (193ms) : 188, 198
section Bailout
This PR (8042) - mean (199ms) : 194, 204
master - mean (197ms) : 193, 201
section CallTarget+Inlining+NGEN
This PR (8042) - mean (1,125ms) : 1070, 1179
master - mean (1,122ms) : 1054, 1191
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (278ms) : 271, 285
master - mean (276ms) : 269, 282
section Bailout
This PR (8042) - mean (281ms) : 271, 292
master - mean (276ms) : 272, 280
section CallTarget+Inlining+NGEN
This PR (8042) - mean (944ms) : 910, 978
master - mean (931ms) : 890, 972
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (276ms) : 265, 287
master - mean (270ms) : 264, 276
section Bailout
This PR (8042) - mean (274ms) : 267, 282
master - mean (269ms) : 266, 272
section CallTarget+Inlining+NGEN
This PR (8042) - mean (937ms) : 866, 1007
master - mean (929ms) : 884, 973
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (8042) - mean (275ms) : 265, 285
master - mean (270ms) : 264, 276
section Bailout
This PR (8042) - mean (274ms) : 266, 282
master - mean (269ms) : 265, 273
section CallTarget+Inlining+NGEN
This PR (8042) - mean (842ms) : 811, 874
master - mean (827ms) : 808, 845
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
## Summary of changes Update `Activity`-based benchmarks to reduce variability and make comparisons easier ## Reason for change The various `Activity`-related benchmarks call the global `Tracer` instance, so we should make sure to configure it with our default benchmarking settings (basically disabling background jobs like telemetry/discovery/remote config) to reduce variation. Also added a "baseline" job for comparison and a version that uses hiearachical IDs instead of W3C IDs. Was a prerequisite for a bunch of other work. ## Implementation details - Configure `Benchmarks.OpenTelemetry.InstrumentedApi` project and `Benchmarks.Trace/ActivityBenchmark` to setup the global tracer with Telemetry etc disabled - Add two extra benchmarks to `Benchmarks.Trace/ActivityBenchmark` - `StartStopWithChild_Baseline`, which is the same as `StartStopWithChild` but without the DD integration - `StartStopWithChild_Hierarchical`, which is the same as `StartStopWithChild` but uses hierarchical ID format - Don't run either of these in CI for now (to avoid extra load), just for local comparisons - Simplify the benchmark to just do explicit duck typing (which is closer to what we do normally anyway, and removes a bunch of code) ## Test coverage Running these benchmarks locally shows the improvements we need to make, and highlights that we clearly have a bug with hierarchical IDs 😬 | Method | Runtime | Mean | Error | StdDev | Gen0 | Gen1 | Allocated | | ------------------------------- | -------------------- | ----------: | ----------: | ----------: | -----: | -----: | --------: | | StartStopWithChild_Baseline | .NET 6.0 | 672.9 ns | 23.23 ns | 66.66 ns | 0.0038 | - | 1.09 KB | | StartStopWithChild_Hierarchical | .NET 6.0 | 35,478.1 ns | 1,035.96 ns | 3,021.94 ns | - | - | 10.47 KB | | StartStopWithChild | .NET 6.0 | 3,681.7 ns | 50.23 ns | 44.53 ns | 0.0153 | - | 4.87 KB | | StartStopWithChild_Baseline | .NET 8.0 | 570.8 ns | 16.96 ns | 49.22 ns | 0.0029 | - | 1.09 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 32,478.5 ns | 1,215.04 ns | 3,466.57 ns | - | - | 10.48 KB | | StartStopWithChild | .NET 8.0 | 3,021.4 ns | 59.24 ns | 163.16 ns | 0.0153 | - | 4.77 KB | | StartStopWithChild_Baseline | .NET Core 3.1 | 752.6 ns | 15.06 ns | 30.09 ns | 0.0038 | - | 1.19 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 34,170.1 ns | 572.80 ns | 478.31 ns | - | - | 10.75 KB | | StartStopWithChild | .NET Core 3.1 | 4,905.0 ns | 31.63 ns | 29.58 ns | 0.0153 | - | 5.05 KB | | StartStopWithChild_Baseline | .NET Framework 4.7.2 | 770.7 ns | 6.03 ns | 5.64 ns | 0.2012 | - | 1.24 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 37,386.2 ns | 542.73 ns | 453.21 ns | 1.8921 | 0.1221 | 11.77 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5,884.5 ns | 64.54 ns | 60.37 ns | 0.8621 | - | 5.3 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 👈 - #8037 - #8038 - #8039 - #8040 - #8041 - #8042
1c3bd83 to
268a7e2
Compare
0719f68 to
05a7274
Compare
## Summary of changes Reduce allocation of `ActivityHandlerCommon` by removing `string` concatenation ## Reason for change The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped` methods need to store and retrieve `Activity` instances from a `ConcurrentDictionary<>`. Today they're doing that be concatenating the `Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that concatenation causes a bunch of allocation, so instead introduce a simple `struct` to use as the key instead ## Implementation details Introduce `ActivityKey`, which is essentially `internal readonly record struct ActivityKey(string TraceId, string SpanId`, and use that for all of the dictionary lookups. Which avoids all the string concatenation allocations. ## Test coverage Added some unit tests for `ActivityKey`, by mostly covered by existing integration tests for correctness. Benchmarks show a significant improvement over [the previous results](#8036), particularly for Hierachical IDs which clearly were buggy | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8036 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us | 0.0153 | 4.09 KB | -6.38 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us | 0.0076 | 4.09 KB | -6.39 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us | 0.0153 | 4.32 KB | -6.43 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us | 0.4969 us | 0.7324 | 4.53 KB | -7.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB | -0.57 KB | | StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB | -0.57 KB | | StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 | 4.48 KB | -0.57 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us | 0.7629 | 4.69 KB | -0.61 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 👈 - #8038 - #8039 - #8040 - #8041 - #8042
268a7e2 to
5e799b5
Compare
05a7274 to
7d277c0
Compare
anna-git
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small comments
| if (traceContext is not null | ||
| && traceContext.Environment is null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a different behavior than span.Context.TraceContext?.Environment is null , I mean before if traceContext was null it would be true, and now it wouldn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, that's true, but in the code for span.SetTag we then check if the TraceContext is null, and if it is, we don't change anything 😄 (technically we write an error log, but in this case I don't think that adds much value, and should never happen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this a different behavior...?
Good catch, though! I'd say this was a small oversight in the previous code. There was no point in trying to set env without a TraceContext.
| if (span.GetTag("otel.status_code") is null) | ||
| if (tags.OtelStatusCode is null) | ||
| { | ||
| if (activity6 is not null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: activity6 could be declared closer to usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the activities are created using as because they're originally structs, and this causes boxing, so I think it's best if we continue to keep them all together at the top?
I have ideas about removing all that boxing later, but I'll have to see how big of a deal that is later 😄
## Summary of changes Fix incorrect nullable annotations on `Activity` duck types ## Reason for change While working on other performance things, noticed that the nullable annotations often declared non-nullability when they actually could be null. A particularly confusing part are the `TraceId` and `SpanId` values in `IW3CActivity`. These were marked non-nullable because when you call `Activity.Start()` then these will always be non-null, but _only_ if you're using W3C IDs. If you're using hierarchical IDs ([the default in <.NET 5](https://learn.microsoft.com/en-us/dotnet/core/compatibility/core-libraries/5.0/default-activityidformat-changed)) then these values _will_ be null. As a side note, I suspect this explains the "we saw errors about these being null in error tracking but don't understand why" scenarios 😄 Also, I think we should rename `IW3CActivity` to `IActivity3` instead. W3C _implies_ that it's a W3C activity, but that's not necessarily the case, and is essentially the source of the above confusion I think. `IActivity3` would then be consistently named with `IActivity5` and `IActivity6` we also currently have. ## Implementation details Add nullable annotations to values that _can_ be null, and fix the fallout (mostly in `ActivityKey`) ## Test coverage Covered by existing tests sufficiently I think ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 👈 - #8038 - #8039 - #8040 - #8041 - #8042
## Summary of changes Avoids allocating a closure in .NET Core if we can avoid it ## Reason for change .NET Core's `ConcurrentDictionary.GetOrAdd()` method allows providing a "state" object which we can pass to the `GetOrAdd` method. Using this method avoids allocating a closure every time the method is hit, and we can pass the state using a value tuple to avoid additional allocation there ## Implementation details `#if`/`#else` to glory ## Test coverage Functionality is covered by existing tests, benchmarks show an incremental improvement over #8037 for .NET Core, as expected: | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8037 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.746 us | 0.1128 us | 0.0114 | 3.81 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 2.759 us | 0.0374 us | 0.0114 | 3.81 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.762 us | 0.0584 us | 0.0153 | 4.04 KB | -0.28 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.651 us | 0.0717 us | 0.7324 | 4.54 KB | 0.01 KB (noise) | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.607 us | 0.0508 us | 0.0153 | 4.02 KB | -0.28 KB | | StartStopWithChild | .NET 8.0 | 2.921 us | 0.0617 us | 0.0114 | 3.91 KB | -0.29 KB | | StartStopWithChild | .NET Core 3.1 | 4.922 us | 0.0407 us | 0.0153 | 4.2 KB | -0.28 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.008 us | 0.0979 us | 0.7629 | 4.69 KB | 0 | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 👈 - #8040 - #8041 - #8042
333d0e2 to
343b895
Compare
6d3f6ca to
a3be7e1
Compare
## Summary of changes Reduce allocation of `ActivityHandlerCommon` by removing `string` concatenation ## Reason for change The `ActivityHandlerCommon.ActivityStarted` and `ActivityStopped` methods need to store and retrieve `Activity` instances from a `ConcurrentDictionary<>`. Today they're doing that be concatenating the `Activity`'s `TraceId` and `SpanID`, or by using it's `ID`. All that concatenation causes a bunch of allocation, so instead introduce a simple `struct` to use as the key instead ## Implementation details Introduce `ActivityKey`, which is essentially `internal readonly record struct ActivityKey(string TraceId, string SpanId`, and use that for all of the dictionary lookups. Which avoids all the string concatenation allocations. ## Test coverage Added some unit tests for `ActivityKey`, by mostly covered by existing integration tests for correctness. Benchmarks show a significant improvement over [the previous results](#8036), particularly for Hierachical IDs which clearly were buggy | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8036 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 4.217 us | 0.3227 us | 0.0153 | 4.09 KB | -6.38 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.413 us | 0.2505 us | 0.0076 | 4.09 KB | -6.39 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 5.676 us | 0.4636 us | 0.0153 | 4.32 KB | -6.43 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 6.813 us | 0.4969 us | 0.7324 | 4.53 KB | -7.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 4.105 us | 0.2677 us | 0.0153 | 4.3 KB | -0.57 KB | | StartStopWithChild | .NET 8.0 | 3.475 us | 0.1570 us | 0.0114 | 4.2 KB | -0.57 KB | | StartStopWithChild | .NET Core 3.1 | 5.647 us | 0.3129 us | 0.0153 | 4.48 KB | -0.57 KB | | StartStopWithChild | .NET Framework 4.7.2 | 6.842 us | 0.2992 us | 0.7629 | 4.69 KB | -0.61 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 👈 - #8038 - #8039 - #8040 - #8041 - #8042
a3be7e1 to
b6392a3
Compare
## Summary of changes Various optimizations in the `Activity` handling code to reduce allocations and execution time ## Reason for change Some properties are triksy, as they do a bunch of allocation, so we should avoid them if we can. The changes in here _look_ bigger than they are diff-wise, I've added comments to aid review. ## Implementation details - Don't call `ParentId` until we definitely need it. - This property does a bunch of allocation to generate a "valid" value, so we should avoid it if we can. This makes the conditions a bit harder to read, but delays calling `ParentId` until we're sure we don't have something better already - Avoid calling `Tracer.Instance.ActiveScope?.Span` until we know we need it (very minor optimisations but why not 🤷♂️) - Extract `StopActivitySlow` to a separate method, as we _shouldn't_ hit this now, so should help the JIT out with things like code size etc of the calling method (conjecture, not tested, but I think it's better from code understanding PoV too) - Simplify `ShouldIgnoreByOperationName` which also improves execution time 10x from ~30us to ~3us. ## Test coverage Functionally covered by existing tests. Benchmarks compared to #8039 show improvments. Allocations listed below, but execution time is also improved: I re-ran the numbers after making the updates, and the benchmarks are actually better: <details><summary>Benchmarks for original PR</summary> <p> | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Compared to #8039 | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.410 us | 0.0658 us | 0.0153 | 3.79 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 2.582 us | 0.0443 us | 0.0114 | 3.79 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.272 us | 0.0731 us | 0.0153 | 4.02 KB | -0.02 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.165 us | 0.1245 us | 0.7324 | 4.51 KB | -0.03 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.312 us | 0.0266 us | 0.0153 | 3.78 KB | -0.04 KB | | StartStopWithChild | .NET 8.0 | 2.648 us | 0.0306 us | 0.0114 | 3.78 KB | -0.13 KB | | StartStopWithChild | .NET Core 3.1 | 4.344 us | 0.0555 us | 0.0076 | 3.97 KB | -0.23 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5.234 us | 0.1568 us | 0.7095 | 4.39 KB | -0.30 KB | </p> </details> | Method | Runtime | Mean | StdDev | Gen0 | Allocated | Alloc Ratio | | ------------------------------- | -------------------- | -------: | --------: | -----: | --------: | ----------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3.770 us | 0.3101 us | 0.0076 | 3.58 KB | -0.23 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 3.039 us | 0.2377 us | 0.0076 | 3.58 KB | -0.23 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4.490 us | 0.1135 us | 0.0076 | 3.8 KB | -0.22 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 5.303 us | 0.2469 us | 0.6943 | 4.3 KB | -0.24 KB | | | | | | | | | | StartStopWithChild | .NET 6.0 | 3.386 us | 0.0971 us | 0.0114 | 3.59 KB | -0.43 KB | | StartStopWithChild | .NET 8.0 | 2.661 us | 0.0218 us | 0.0114 | 3.59 KB | -0.32 KB | | StartStopWithChild | .NET Core 3.1 | 4.540 us | 0.1625 us | 0.0076 | 3.78 KB | -0.42 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5.563 us | 0.1946 us | 0.6790 | 4.2 KB | -0.49 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 - #8040 👈 - #8041 - #8042
876d552 to
1630be2
Compare
## Summary of changes Update code that enumerates properties of `IActivity` duck types to use allocation-free enumeration ## Reason for change Our duck types all use the public APIs of `Activity` to grab tags, events, and links. These types are generally implemented internally as custom linked list objects, but are exposed as `IEnumerable<>` types. The internal types implement a `struct`-based enumerator, but as the types aren't exposed directly, the compiler can't use those and ends up allocating. That's... annoying. To work around that, this adds a "hack" that OTel were [actually using themselves](https://github.com/open-telemetry/opentelemetry-dotnet/blob/73bff75ef653f81fe6877299435b21131be36dc0/src/OpenTelemetry/Internal/EnumerationHelper.cs#L58) (and [which has been discussed elsewhere](https://www.macrosssoftware.com/2020/07/13/enumerator-performance-surprises/)), of building a dynamic method that finds the `GetEnumerator` type and uses that. We can't use it everywhere because we don't have the types available (e.g. links and events), but we _can_ use it for tags. There's essentially a trade-off of startup time vs runtime allocation, which is worth it in this case IMO. Additionally, the `Activity` implementation uses an empty array as the return value when there _are_ no tags/links/events, so we can avoid the enumerator allocation in those cases by specifically looking for an empty array and bailing if so. ## Implementation details I chose to add the `EnumerationHelper` as an explicitly `Activity`-related thing for now, but we can consider exposing this more widely if we get confidence. I also avoided the concurrent dictionary approach that is the "default" for this, and instead added an additional helper to read a static field that we populate once. We have to wait until we have a "real" activity to do the population, so that we get the right type, which is a bit annoying, but it works... ## Test coverage Added a couple of unit tests for the result and confirmed it works as expected. Benchmarks vs #8040 shw a nice allocation improvement: | Method | Runtime | Mean | StdDev | Gen0 | Gen1 | Allocated | Compared to #8040 | | ------------------------------- | -------------------- | ---------: | --------: | -----: | -----: | --------: | ----------------: | | StartStopWithChild_Hierarchical | .NET 6.0 | 3,248.6 ns | 28.71 ns | 0.0114 | 0.0038 | 3.6 KB | -0.19 KB | | StartStopWithChild_Hierarchical | .NET 8.0 | 2,651.1 ns | 80.80 ns | 0.0114 | - | 3.6 KB | -0.19 KB | | StartStopWithChild_Hierarchical | .NET Core 3.1 | 4,482.2 ns | 263.51 ns | 0.0153 | - | 3.83 KB | -0.19 KB | | StartStopWithChild_Hierarchical | .NET Framework 4.7.2 | 4,915.8 ns | 63.81 ns | 0.7019 | - | 4.33 KB | -0.18 KB | | StartStopWithChild | .NET 6.0 | 3,176.8 ns | 37.73 ns | 0.0114 | - | 3.59 KB | -0.19 KB | | StartStopWithChild | .NET 8.0 | 2,598.0 ns | 40.05 ns | 0.0114 | - | 3.59 KB | -0.19 KB | | StartStopWithChild | .NET Core 3.1 | 4,260.7 ns | 59.98 ns | 0.0076 | - | 3.78 KB | -0.19 KB | | StartStopWithChild | .NET Framework 4.7.2 | 5,097.7 ns | 46.98 ns | 0.6790 | - | 4.2 KB | -0.19 KB | ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-915 Part of a stack working to improve OTel performance - #8036 - #8037 - #8038 - #8039 - #8040 - #8041 👈 - #8042 --------- Co-authored-by: Lucas Pimentel <lucas.pimentel@datadoghq.com>
a0bbc40 to
ab631ca
Compare
BenchmarksBenchmark execution time: 2026-01-13 16:30:22 Comparing candidate commit babfcae in PR branch Found 10 performance improvements and 5 performance regressions! Performance is the same for 158 metrics, 13 unstable metrics. scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net472
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild net6.0
scenario:Benchmarks.Trace.ActivityBenchmark.StartStopWithChild netcoreapp3.1
scenario:Benchmarks.Trace.AgentWriterBenchmark.WriteAndFlushEnrichedTraces net6.0
scenario:Benchmarks.Trace.AspNetCoreBenchmark.SendRequest net6.0
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces net472
scenario:Benchmarks.Trace.CIVisibilityProtocolWriterBenchmark.WriteAndFlushEnrichedTraces netcoreapp3.1
scenario:Benchmarks.Trace.CharSliceBenchmark.OptimizedCharSliceWithPool net6.0
scenario:Benchmarks.Trace.CharSliceBenchmark.OriginalCharSlice net6.0
scenario:Benchmarks.Trace.ElasticsearchBenchmark.CallElasticsearchAsync netcoreapp3.1
scenario:Benchmarks.Trace.GraphQLBenchmark.ExecuteAsync net6.0
scenario:Benchmarks.Trace.Iast.StringAspectsBenchmark.StringConcatAspectBenchmark netcoreapp3.1
scenario:Benchmarks.Trace.SpanBenchmark.StartFinishTwoScopes net6.0
|
ab631ca to
0986f90
Compare
…them (e.g. the activity is not ignored
… reduce allocations
0986f90 to
7d3962f
Compare
bouwkast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment on the version that you also noted on, I'm not entirely sure if what we had was correct or what it currently is is correct
Summary of changes
Improve performance around the "population" of tags from an
Activityinto aSpanReason for change
Currently we do a lot of allocation in the
OtlpHelpers.AgentConvertSpanmethod. TheOpenTelemetryTagsobject also has a lot of properties on it that we never set, and are only used for reading tags in theOperationMapper, even though many may never even be read. This increases the size of the tags object.This PR aims to do various optimizations around the "close activity" paths:
OpenTelemetryTagsobject by removing properties only used byOperationMapperOpenTelemetryTagsfor values that we explictly set inOtlpHelpers.AgentConvertSpanImplementation details
OpenTelemetryTags, even though we may throw it away if the activity is ignored, so that's an easy win.AgentConvertSpanis only everOpenTelemetryTagsbased on the call sites, so we can interact with the tags object directly where posisble.span.SetTagand favouringtags.SetTagwhen we know it's not a "special" tag.OperationMapperto use theGetTag()API, and cache the values at various points. The resulting code is uglier, but overall means we reduce ~100 bytes off every span, so I think it's worth itTest coverage
Functionality is covered by existing, this is just a refactoring, and there's good tests for
OperationMappercurrently. Benchmarks vs #8041 show a nice allocation improvement, at the expense of slower execution (this was local though, so not 100% on that, will see what the CI results show). If there is a significant slow down, I'll try to isolate it, but I think the allocation improvments are probably worth it either wayOther details
Added a couple of extra tests that were missing from the previous PR in the stack too
https://datadoghq.atlassian.net/browse/LANGPLAT-915
Part of a stack working to improve OTel performance
Activitybenchmarks for stability #8036ActivityKeyto remove string concatenation #8037Activityduck types #8038ActivityMappingById.GetOrAdd#8039Activityobjects #8041OpenTelemetryTagspopulation to reduce allocation and work #8042 👈