-
Notifications
You must be signed in to change notification settings - Fork 272
refactor: simplify telemetry Stopwatch API #2253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -411,12 +411,6 @@ const ( | |
| resultTypeFailure = "failure" | ||
| ) | ||
|
|
||
| var ( | ||
| // Pre-allocated result attributes for use with PrecomputeAttrs. | ||
| Success = attribute.String(resultAttr, resultTypeSuccess) | ||
| Failure = attribute.String(resultAttr, resultTypeFailure) | ||
| ) | ||
|
|
||
| func (t Stopwatch) Success(ctx context.Context, total int64, kv ...attribute.KeyValue) { | ||
| t.end(ctx, resultTypeSuccess, total, kv...) | ||
| } | ||
|
|
@@ -429,22 +423,9 @@ func (t Stopwatch) end(ctx context.Context, result string, total int64, kv ...at | |
| kv = append(kv, attribute.KeyValue{Key: resultAttr, Value: attribute.StringValue(result)}) | ||
| kv = append(t.kv, kv...) | ||
| opt := metric.WithAttributeSet(attribute.NewSet(kv...)) | ||
| t.RecordRaw(ctx, total, opt) | ||
| } | ||
|
|
||
| // PrecomputeAttrs builds a reusable MeasurementOption from the given attribute | ||
| // key-values. The option must include all attributes (including "result"). | ||
| // Use with Stopwatch.Record to avoid per-call attribute allocation. | ||
| func PrecomputeAttrs(kv ...attribute.KeyValue) metric.MeasurementOption { | ||
| return metric.WithAttributeSet(attribute.NewSet(kv...)) | ||
| } | ||
|
|
||
| // RecordRaw records an operation using a precomputed attribute option, it does | ||
| // not include any previous attributes passed at Begin(). Zero-allocation | ||
| // alternative to Success/Failure for hot paths. | ||
| func (t Stopwatch) RecordRaw(ctx context.Context, total int64, precomputedAttrs metric.MeasurementOption) { | ||
| amount := time.Since(t.start).Milliseconds() | ||
| t.histogram.Record(ctx, amount, precomputedAttrs) | ||
| t.sum.Add(ctx, total, precomputedAttrs) | ||
| t.count.Add(ctx, 1, precomputedAttrs) | ||
| t.histogram.Record(ctx, amount, opt) | ||
| t.sum.Add(ctx, total, opt) | ||
|
Comment on lines
427
to
+429
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. π‘ The new end() method calls metric.WithAttributes(kv...) three separate times β once each for histogram.Record, sum.Add, and count.Add β causing the OTel SDK to independently sort and deduplicate the key-value slice (via attribute.NewSet) three times per recording instead of once. The fix is to compute opt := metric.WithAttributeSet(attribute.NewSet(kv...)) once before the three calls and reuse it. Extended reasoning...What the bug is and how it manifests In the refactored Stopwatch.end() method (meters.go lines 427-429), metric.WithAttributes(kv...) is called three independent times β once for t.histogram.Record, once for t.sum.Add, and once for t.count.Add. The OTel Go SDK's metric.WithAttributes creates an attributeOption that wraps raw key-values; when each instrument processes its MeasurementOption, it internally calls attribute.NewSet(attrs...) to sort and deduplicate the slice, allocating a distinct attribute.Set each time. The result is 3x attribute-set construction per end() invocation instead of 1x. The specific code path that triggers it Every call to FullFetchChunker.Slice and StreamingChunker.Slice invokes either timer.Success or timer.Failure, both of which delegate to end(). Additionally, the remote-read inner timer (fetchSW / fetchTimer) goes through the same path. These are block-IO hot paths executed on every chunk read. Why existing code does not prevent it The previous end() computed the attribute set once: opt := metric.WithAttributeSet(attribute.NewSet(kv...)) and then called RecordRaw(ctx, total, opt), which passed that single pre-built MeasurementOption to all three instrument calls. RecordRaw was explicitly documented as a zero-allocation alternative for hot paths. This PR removes RecordRaw and PrecomputeAttrs to simplify the API, but the simplified end() does not preserve the single-allocation property. What the impact is Each attribute-set construction for a typical 2-4 KV slice requires sorting, deduplication, and a heap allocation. At 3x per recording on the chunker hot path, this is measurable under high block-IO throughput. The deleted benchmark (BenchmarkChunkerSlice_CacheHit) was specifically designed to expose this cost; its removal means the regression is now invisible to CI. How to fix it Compute the option once before the three instrument calls inside end(): opt := metric.WithAttributeSet(attribute.NewSet(kv...)) This restores 1x attribute-set construction per recording without requiring any change to callers or re-adding RecordRaw/PrecomputeAttrs. Step-by-step proof
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed β same as above, end() now computes the attribute set once. |
||
| t.count.Add(ctx, 1, opt) | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ The benchmark
chunk_bench_test.gowas deleted with the stated justification that it 'relied on the removed PrecomputeAttrs API', but this is factually incorrect β the file never imported thetelemetrypackage and never calledPrecomputeAttrs,RecordRaw, or referencedchunkerAttrs. All four APIs it used (NewFullFetchChunker,cache.setIsCached,chunker.Slice,chunker.Close) still exist unchanged. Consider restoring the benchmark, especially since this PR changes the Slice cache-hit hot-path from a zero-allocation precomputedMeasurementOptionto threemetric.WithAttributescalls per invocation.Extended reasoning...
What the bug is
The PR description states chunk_bench_test.go was deleted because it "relied on the removed PrecomputeAttrs API." This is factually incorrect. Examining the deleted file's content from the diff shows its imports are exclusively:
context,path/filepath,testing,go.opentelemetry.io/otel/sdk/metric, andgithub.com/e2b-dev/infra/packages/orchestrator/pkg/sandbox/block/metrics. Thetelemetrypackage is never imported. The functionsPrecomputeAttrs,RecordRaw, and thechunkerAttrsvariable are never referenced anywhere in the file.The specific code path
The benchmark exercises four operations on
FullFetchChunker:NewFullFetchChunker(cbCacheSize, cbBlockSize, nil, filepath.Join(b.TempDir(), "cache"), m)β signature is unchanged in the new codechunker.cache.setIsCached(0, cbCacheSize)βcachefield andsetIsCachedmethod still exist (same package, unexported but accessible)chunker.Slice(ctx, off, cbChunkSize)β still exists with the same signaturechunker.Close()β still existsThe benchmark would compile and run correctly against the refactored code without any modifications.
Why existing code doesn't prevent this
The PR's stated justification is simply a factual error. The benchmark comment even says it tests "timer.Success with attribute construction" β which is precisely what the new code path does (inline
attribute.String(pullType, pullTypeLocal)construction insidetimer.Success). The benchmark was more relevant after this change than before, not less.What the impact is
The PR introduces a performance characteristic change in the
FullFetchChunker.Slicecache-hit path: previously a single precomputedmetric.MeasurementOption(oneattribute.NewSetcomputed once at init time, zero per-call allocation) was used; now threemetric.WithAttributes(kv...)calls are made perSliceinvocation (each callingattribute.NewSetat call time). The benchmark explicitly measures "the full FullFetchChunker.Slice hot path on a cache hit: bitmap check + mmap slice return + OTEL timer.Success with attribute construction." Its deletion removes the only automated way to detect or quantify this allocation change and any future regressions on this hot path.How to fix it
Restore
chunk_bench_test.goβ it requires zero changes to compile and run against the new code.Step-by-step proof
context,path/filepath,testing,sdkmetric,blockmetrics. Notelemetryimport. βPrecomputeAttrs: not found. Search forRecordRaw: not found. Search forchunkerAttrs: not found. βNewFullFetchChunkerin the newchunk.go(line ~72): signaturefunc NewFullFetchChunker(size, blockSize int64, base storage.SeekableReader, cachePath string, metrics metrics.Metrics) (*FullFetchChunker, error)β identical to what the benchmark calls. βcachefield onFullFetchChunker(line ~63 in new code):cache *Cacheβ still present.setIsCachedon*Cache: called in the new code at multiple points, still exists. βgo test -bench=BenchmarkChunkerSlice_CacheHit ./packages/orchestrator/pkg/sandbox/block/unchanged.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed β benchmark restored in follow-up commit. It compiles against the refactored code without modifications.