Skip to content

Conversation

mahendrabishnoi2
Copy link
Member

@mahendrabishnoi2 mahendrabishnoi2 commented Aug 4, 2025

Fixes #7017

Adds support for below metrics for sdk/log#BatchProcessor

  • otel.sdk.processor.log.processed
  • otel.sdk.processor.log.queue.size
  • otel.sdk.processor.log.queue.capacity

These metrics are defined at https://github.com/open-telemetry/semantic-conventions/blob/v1.36.0/docs/otel/sdk-metrics.md and are experimental. Because of this, these use global MeterProvider and are behind a feature flag OTEL_GO_X_SELF_OBSERVABILITY.

TODO:

  • tests
  • clarification on Shutdown method semantics for metric registration callback in case of early return and direct exporter shutdown (out of time).
Observability Implementation Checklist

Observability Implementation Checklist

Based on the project Observability guidelines, ensure the following are completed:

Environment Variable Activation

  • Observability features are disabled by default
  • Features are activated through the OTEL_GO_X_OBSERVABILITY environment variable
  • Use consistent pattern with x.Observability.Enabled() check 1
  • Follow established experimental feature pattern 23

Encapsulation

  • Instrumentation is encapsulated within a dedicated struct (e.g., Instrumentation)
  • Instrumentation is not mixed into the instrumented component
  • Instrumentation code is in its own file or package if complex/reused
  • Instrumentation setup doesn't bloat the main component code

Initialization

  • Initialization is only done when observability is enabled
  • Setup is explicit and side-effect free
  • Return errors from initialization when appropriate
  • Use the global Meter provider (e.g., otel.GetMeterProvider())
  • Include proper meter configuration with:
  • Instrumentation package name is used for the Meter
  • Instrumentation version (e.g. Version)
  • Schema URL (e.g. SchemaURL)

Performance

  • Little to no overhead when observability is disabled
  • Expensive operations are only executed when observability is enabled
  • When enabled, instrumentation code paths are optimized to reduce allocation/computation overhead

Attribute and Option Allocation Management

  • Use sync.Pool for attribute slices and options with dynamic attributes
  • Pool objects are properly reset before returning to pool
  • Pools are scoped for maximum efficiency while ensuring correctness

Caching

  • Static attribute sets known at compile time are pre-computed and cached
  • Common attribute combinations use lookup tables/maps

Benchmarking

  • Benchmarks provided for all instrumentation code
  • Benchmark scenarios include both enabled and disabled observability
  • Benchmark results show impact on allocs/op, B/op, and ns/op (use b.ReportAllocs() in benchmarks)

Error Handling and Robustness

  • Errors are reported back to caller when possible
  • Partial failures are handled gracefully
  • Use partially initialized components when available
  • Return errors to caller instead of only using otel.Handle()
  • Use otel.Handle() only when component cannot report error to user

Context Propagation

  • Observability measurements receive the context from the function being measured (don't break context propagation by using context.Background())

Semantic Conventions Compliance

  • All metrics follow OpenTelemetry Semantic Conventions
  • Use the otelconv convenience package for metric semantic conventions
  • Component names follow semantic conventions
  • Use package path scope type as stable identifier for non-standard components
  • Component names are stable unique identifiers
  • Use global counter for uniqueness if necessary
  • Component ID counter is resettable for deterministic testing

Testing

  • Use deterministic testing with isolated state
  • Restore previous state after tests (t.Cleanup())
  • Isolate meter provider for testing
  • Use t.Setenv() for environment variable testing
  • Reset component ID counter for deterministic component names
  • Test order doesn't affect results

Footnotes

  1. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/observ/instrumentation.go#L101-L103

  2. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/exporters/stdout/stdouttrace/internal/x/x.go

  3. https://github.com/open-telemetry/opentelemetry-go/blob/e4ab3141123d0811125a69823dbbe4d9ec5a9b8f/sdk/internal/x/x.go

- setup and teardown for self observability. Includes setting up counters for queue capacity, size, log processed. Also includes callback registration for queue cap and size
- doc comments pending
- has some todo comments that I plan to address by discussion on PR review
…ation

- increment logProcessedCounter on push to exporter or drops from queue
- pending: flush methods where we are pushing to exporter
Copy link

codecov bot commented Aug 4, 2025

Codecov Report

❌ Patch coverage is 80.18868% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.0%. Comparing base (73cbc69) to head (a931308).

Files with missing lines Patch % Lines
sdk/log/batch.go 51.8% 8 Missing and 5 partials ⚠️
sdk/log/exporter.go 0.0% 8 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main   #7124     +/-   ##
=======================================
- Coverage   86.1%   86.0%   -0.1%     
=======================================
  Files        291     293      +2     
  Lines      25613   25710     +97     
=======================================
+ Hits       22056   22131     +75     
- Misses      3184    3201     +17     
- Partials     373     378      +5     
Files with missing lines Coverage Δ
sdk/log/internal/counter/counter.go 100.0% <100.0%> (ø)
sdk/log/internal/observ/batch_log_processor.go 100.0% <100.0%> (ø)
sdk/log/exporter.go 92.5% <0.0%> (-5.3%) ⬇️
sdk/log/batch.go 94.6% <51.8%> (-5.4%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MrAlias
Copy link
Contributor

MrAlias commented Sep 15, 2025

@mahendrabishnoi2 please take a look at #7017. I have updated the issue with a check list of items from our project Observability guidelines that need to be completed in this PR.

@MrAlias
Copy link
Contributor

MrAlias commented Oct 7, 2025

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

@mahendrabishnoi2
Copy link
Member Author

@mahendrabishnoi2 I just wanted to check in with this PR. Are you still able to update this PR or open a new PR to address the instrumentation requirements?

Yes @MrAlias. I can finish it in 2-3 days as per the updated guidelines. It would be great if I could get some feedback on the approach I'm taking for this PR. Since we need to record the metric just before logs are passed to exporter, I have crated another wrapper (similar to timeoutExporter or chunkedExporter) to achieve this. Just trying to avoid the rework.
If this approach seems fine, I can go ahead and updated it as per guidelines, add tests.

@mahendrabishnoi2
Copy link
Member Author

processor_1.txt
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/log/internal/observ
cpu: Apple M1 Pro
BenchmarkBLP/Processed-8         	1000000000	         0.3302 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3341 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3392 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3316 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3346 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3358 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3397 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3460 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3329 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Processed-8         	1000000000	         0.3326 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3316 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3338 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3357 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3321 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3431 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3568 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3327 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3383 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3320 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/ProcessedQueueFull-8         	1000000000	         0.3337 ns/op	       0 B/op	       0 allocs/op
BenchmarkBLP/Callback-8                   	  507945	      2348 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  473138	      2474 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  504937	      2375 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  508374	      2265 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  526850	      2268 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  524180	      2300 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  524942	      2295 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  553074	      2240 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  535147	      2276 ns/op	    3362 B/op	      24 allocs/op
BenchmarkBLP/Callback-8                   	  493884	      2467 ns/op	    3362 B/op	      24 allocs/op
PASS
ok  	go.opentelemetry.io/otel/sdk/log/internal/observ	20.151s
goos: darwin
goarch: arm64
pkg: go.opentelemetry.io/otel/sdk/log/internal/observ
cpu: Apple M1 Pro
                         │ processor_1.txt │
                         │     sec/op      │
BLP/Processed-8               0.3343n ± 2%
BLP/ProcessedQueueFull-8      0.3337n ± 3%
BLP/Callback-8                 2.298µ ± 7%
geomean                        6.353n

                         │ processor_1.txt │
                         │      B/op       │
BLP/Processed-8               0.000 ± 0%
BLP/ProcessedQueueFull-8      0.000 ± 0%
BLP/Callback-8              3.283Ki ± 0%
geomean                                  ¹
¹ summaries must be >0 to compute geomean

                         │ processor_1.txt │
                         │    allocs/op    │
BLP/Processed-8               0.000 ± 0%
BLP/ProcessedQueueFull-8      0.000 ± 0%
BLP/Callback-8                24.00 ± 0%
geomean                                  ¹
¹ summaries must be >0 to compute geomean

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logs SDK observability - batch processor metrics

2 participants