Use Volatile instead of Interlocked where appropriate#6051
Use Volatile instead of Interlocked where appropriate#6051pentp wants to merge 11 commits intoopen-telemetry:mainfrom
Conversation
| case AggregationType.LongGauge: | ||
| { | ||
| Interlocked.Exchange(ref this.runningValue.AsLong, number); | ||
| Volatile.Write(ref this.runningValue.AsLong, number); |
There was a problem hiding this comment.
curious, if this shows improvement in the metric stress tests?
There was a problem hiding this comment.
@pentp If there is a reasonable improvement in performance, I will spend time analyzing the complete metrics SDK to understand how these changes affect it.
| } | ||
|
|
||
| Interlocked.Exchange(ref this.isCriticalSectionOccupied, 0); | ||
| Volatile.Write(ref this.isCriticalSectionOccupied, 0); |
There was a problem hiding this comment.
We need to consider the memory ordering guarantees of Volatile.Write. With Interlocked methods, the read/writes would not be moved before or after a given Interlocked method.
With volatile writes, read/writes that happen after a given Volatile.Write method can be moved before that Volatile.Write method. We need to evaluate if that affects the correctness of our code. There are some write operations that we do after releasing the locks (for exemplar and MetricPoint updates):
- Call
OnCollectedfor Exemplars which resets the internal measurement state - Update
MetricStatustoCollectPendingwhen updatingMetricPoints
There was a problem hiding this comment.
I went over all uses of Interlocked and also checked the code flow after these lock releases. I now found a few places where the current code incorrectly relies on the preceding interlocked operation for memory ordering, for example in case of MetricPoint.UpdateWithExemplar the order of operations is currently:
Interlocked.Exchange(ref this.runningValue.AsLong, number); // full fence
this.UpdateExemplar(number, tags, offerExemplar); // could run arbitrary lock-free code, though in practice uses locks
this.MetricPointStatus = MetricPointStatus.CollectPending; // no memory ordering guarantees, could become observable before exemplar updatesWith this PR:
Volatile.Write(ref this.runningValue.AsLong, number); // release
this.UpdateExemplar(number, tags, offerExemplar);
Volatile.Write(ref this.status, (byte)MetricPointStatus.CollectPending); // release, guarantees all exemplar updates become observable before|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #6051 +/- ##
==========================================
- Coverage 86.61% 86.48% -0.13%
==========================================
Files 258 259 +1
Lines 11795 11880 +85
==========================================
+ Hits 10216 10275 +59
- Misses 1579 1605 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
|
| /// <summary> | ||
| /// Represents a metric data point. | ||
| /// </summary> | ||
| [StructLayout(LayoutKind.Auto)] |
There was a problem hiding this comment.
Using auto-layout together with the field type change for the two enums below reduces the struct size from 72 bytes to 64 bytes.
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or pushing will instruct the bot to automatically remove the label. This bot runs once per day. |
|
This PR has been idle for quite some time and is currently adding to our maintenance overhead. I will go ahead and close it for now. When you are ready to address the follow-ups, please feel free to reopen the PR for review. Thank you for your contributions! |
Volatile reads/writes are atomic and have acquire/release semantics, but are for the most part as fast as regular reads/writes. Any interlocked operation is at least 30-40 CPU cycles and needs exclusive cache line ownership, which is especially bad for reads.
Split off from #6048.