[WIP] sdk/log: self observability: batch log processor metrics #7124

mahendrabishnoi2 · 2025-08-04T06:33:02Z

Adds support for below metrics for sdk/log#BatchProcessor

otel.sdk.processor.log.processed
otel.sdk.processor.log.queue.size
otel.sdk.processor.log.queue.capacity

These metrics are defined at https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md and are experimental. Because of this, these use global MeterProvider and are behind a feature flag OTEL_GO_X_SELF_OBSERVABILITY.

TODO:

tests
clarification on Shutdown method semantics for metric registration callback in case of early return and direct exporter shutdown (out of time).

Observability Implementation Checklist

Based on the project Observability guidelines, ensure the following are completed:

Environment Variable Activation

Observability features are disabled by default
Features are activated through the OTEL_GO_X_OBSERVABILITY environment variable
Use consistent pattern with x.Observability.Enabled() check ¹
Follow established experimental feature pattern ²³

Encapsulation

Instrumentation is encapsulated within a dedicated struct (e.g., Instrumentation)
Instrumentation is not mixed into the instrumented component
Instrumentation code is in its own file or package if complex/reused
Instrumentation setup doesn't bloat the main component code

Initialization

Initialization is only done when observability is enabled
Setup is explicit and side-effect free
Return errors from initialization when appropriate
Use the global Meter provider (e.g., otel.GetMeterProvider())
Include proper meter configuration with:
Instrumentation package name is used for the Meter
Instrumentation version (e.g. Version)
Schema URL (e.g. SchemaURL)

Performance

Little to no overhead when observability is disabled
Expensive operations are only executed when observability is enabled
When enabled, instrumentation code paths are optimized to reduce allocation/computation overhead

Attribute and Option Allocation Management

Use sync.Pool for attribute slices and options with dynamic attributes
Pool objects are properly reset before returning to pool
Pools are scoped for maximum efficiency while ensuring correctness

Caching

Static attribute sets known at compile time are pre-computed and cached
Common attribute combinations use lookup tables/maps

Benchmarking

Benchmarks provided for all instrumentation code
Benchmark scenarios include both enabled and disabled observability
Benchmark results show impact on allocs/op, B/op, and ns/op (use b.ReportAllocs() in benchmarks)

Error Handling and Robustness

Errors are reported back to caller when possible
Partial failures are handled gracefully
Use partially initialized components when available
Return errors to caller instead of only using otel.Handle()
Use otel.Handle() only when component cannot report error to user

Context Propagation

Observability measurements receive the context from the function being measured (don't break context propagation by using context.Background())

Semantic Conventions Compliance

All metrics follow OpenTelemetry Semantic Conventions
Use the otelconv convenience package for metric semantic conventions
Component names follow semantic conventions
Use package path scope type as stable identifier for non-standard components
Component names are stable unique identifiers
Use global counter for uniqueness if necessary
Component ID counter is resettable for deterministic testing

Testing

Use deterministic testing with isolated state
Restore previous state after tests (t.Cleanup())
Isolate meter provider for testing
Use t.Setenv() for environment variable testing
Reset component ID counter for deterministic component names
Test order doesn't affect results

- setup and teardown for self observability. Includes setting up counters for queue capacity, size, log processed. Also includes callback registration for queue cap and size - doc comments pending - has some todo comments that I plan to address by discussion on PR review

…ation - increment logProcessedCounter on push to exporter or drops from queue - pending: flush methods where we are pushing to exporter

codecov · 2025-08-04T15:19:10Z

Codecov Report

❌ Patch coverage is 80.18868% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.0%. Comparing base (73cbc69) to head (a931308).

Files with missing lines	Patch %	Lines
sdk/log/batch.go	51.8%	8 Missing and 5 partials ⚠️
sdk/log/exporter.go	0.0%	8 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##            main   #7124     +/-   ##
=======================================
- Coverage   86.1%   86.0%   -0.1%     
=======================================
  Files        291     293      +2     
  Lines      25613   25710     +97     
=======================================
+ Hits       22056   22131     +75     
- Misses      3184    3201     +17     
- Partials     373     378      +5

Files with missing lines	Coverage Δ
sdk/log/internal/counter/counter.go	`100.0% <100.0%> (ø)`
sdk/log/internal/observ/batch_log_processor.go	`100.0% <100.0%> (ø)`
sdk/log/exporter.go	`92.5% <0.0%> (-5.3%)`	⬇️
sdk/log/batch.go	`94.6% <51.8%> (-5.4%)`	⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…eflect what it means for a log to be processed

MrAlias · 2025-09-15T20:55:32Z

@mahendrabishnoi2 please take a look at #7017. I have updated the issue with a check list of items from our project Observability guidelines that need to be completed in this PR.

MrAlias · 2025-10-07T17:53:41Z

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

mahendrabishnoi2 · 2025-10-07T18:00:22Z

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

Yes @MrAlias. I can finish it in 2-3 days as per the updated guidelines. It would be great if I could get some feedback on the approach I'm taking for this PR. Since we need to record the metric just before logs are passed to exporter, I have crated another wrapper (similar to timeoutExporter or chunkedExporter) to achieve this. Just trying to avoid the rework.
If this approach seems fine, I can go ahead and updated it as per guidelines, add tests.

…w guidelines

…ty and integrate it with processor.

mahendrabishnoi2 · 2025-10-12T15:06:11Z

processor_1.txt

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/log/internal/observ
cpu: Apple M1 Pro
BenchmarkBLP/Processed-8         	1000000000	         0.3302 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3341 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3392 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3316 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3346 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3358 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3397 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3460 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3329 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3326 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3316 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3338 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3357 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3321 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3431 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3568 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3327 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3383 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3320 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3337 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Callback-8                   	  507945	      2348 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  473138	      2474 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  504937	      2375 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  508374	      2265 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  526850	      2268 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  524180	      2300 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  524942	      2295 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  553074	      2240 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  535147	      2276 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  493884	      2467 ns/op	    3362 B/op	      24 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/log/internal/observ	20.151s

goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/log/internal/observ
cpu: Apple M1 Pro
                         │ processor_1.txt │
                         │     sec/op      │
BLP/Processed-8               0.3343n ± 2%
BLP/ProcessedQueueFull-8      0.3337n ± 3%
BLP/Callback-8                 2.298µ ± 7%
geomean                        6.353n

                         │ processor_1.txt │
                         │      B/op       │
BLP/Processed-8               0.000 ± 0%
BLP/ProcessedQueueFull-8      0.000 ± 0%
BLP/Callback-8              3.283Ki ± 0%
geomean                                  ¹
¹ summaries must be >0 to compute geomean

                         │ processor_1.txt │
                         │    allocs/op    │
BLP/Processed-8               0.000 ± 0%
BLP/ProcessedQueueFull-8      0.000 ± 0%
BLP/Callback-8                24.00 ± 0%
geomean                                  ¹
¹ summaries must be >0 to compute geomean

mahendrabishnoi2 added 3 commits August 4, 2025 11:25

- added hook to call configureSelfObservability on BatchProcessor cre…

9de991b

…ation - increment logProcessedCounter on push to exporter or drops from queue - pending: flush methods where we are pushing to exporter

- run make precommit

8d1edbf

pellared mentioned this pull request Aug 11, 2025

Logs SDK observability - batch processor metrics #7017

Open

46 tasks

mahendrabishnoi2 added 6 commits August 17, 2025 11:27

Merge branch 'main' into logs-batch-processor-metrics

ec0b55e

flatten setup for self-observability to make it transparent

385fd9f

add metricsExporter, a wrapper to record successful log processed metric

5c405d7

integrate with metricsExporter when self observability is enabled

1366caa

don't record metric when a log is added to queue, update comment to r…

b2dd092

…eflect what it means for a log to be processed

update CHANGELOG.md and README.md (sdk/log/internal/x/README.md)

4f24dff

mahendrabishnoi2 mentioned this pull request Aug 28, 2025

REQUEST: New membership for mahendrabishnoi2 open-telemetry/community#2955

Closed

6 tasks

mahendrabishnoi2 added 9 commits October 12, 2025 16:10

Merge branch 'main' into logs-batch-processor-metrics

c4bf81a

self observability -> observability

4cbaff1

instrumentation implementation in a separate observ package as per ne…

9d42e5c

…w guidelines

remove the wrapped exporter

3a35f8e

use newly created BLP abstraction for observability

96de62a

re-add metricsExporter with newly created struct (BLP) fo observabili…

001ddb6

…ty and integrate it with processor.

use generated counter package for component names

9371ede

test cases for BLP

f3a214a

make precommit

a931308

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] sdk/log: self observability: batch log processor metrics #7124

[WIP] sdk/log: self observability: batch log processor metrics #7124

Uh oh!

mahendrabishnoi2 commented Aug 4, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 4, 2025 •

edited

Loading

Uh oh!

MrAlias commented Sep 15, 2025

Uh oh!

MrAlias commented Oct 7, 2025

Uh oh!

mahendrabishnoi2 commented Oct 7, 2025

Uh oh!

mahendrabishnoi2 commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] sdk/log: self observability: batch log processor metrics #7124

Are you sure you want to change the base?

[WIP] sdk/log: self observability: batch log processor metrics #7124

Uh oh!

Conversation

mahendrabishnoi2 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Observability Implementation Checklist

Environment Variable Activation

Encapsulation

Initialization

Performance

Attribute and Option Allocation Management

Caching

Benchmarking

Error Handling and Robustness

Context Propagation

Semantic Conventions Compliance

Testing

Footnotes

Uh oh!

codecov bot commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

MrAlias commented Sep 15, 2025

Uh oh!

MrAlias commented Oct 7, 2025

Uh oh!

mahendrabishnoi2 commented Oct 7, 2025

Uh oh!

mahendrabishnoi2 commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mahendrabishnoi2 commented Aug 4, 2025 •

edited

Loading

codecov bot commented Aug 4, 2025 •

edited

Loading