Initial attempt at OpenTelemetry support#873
Initial attempt at OpenTelemetry support#873smocherla-brex wants to merge 6 commits intobuchgr:masterfrom
Conversation
| if remote.Transport == nil { | ||
| remote.Transport = http.DefaultTransport | ||
| } |
There was a problem hiding this comment.
It looks like NewTransport does this internally, so we can skip it.
| } | ||
|
|
||
| if c.Otel.Tracing.SampleRate == 0 { | ||
| c.Otel.Tracing.SampleRate = 1.0 // Default to 100% sampling |
There was a problem hiding this comment.
I think typically a user from my experience would expect to get all the traces that're being emitted, now if they have too much data, they can either filter the trace spans out in the otel-collector or override this config.
| c.Otel.Tracing.SampleRate, | ||
| ) | ||
| if err != nil { | ||
| log.Printf("WARNING: Failed to initialize OpenTelemetry: %v", err) |
There was a problem hiding this comment.
nitpick: please use Warning instead of WARNING
|
|
||
| // Initialize OpenTelemetry if enabled | ||
| var otelProvider *otel.TracerProvider | ||
| if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled { |
There was a problem hiding this comment.
It might be worth setting this to an otelTracingEnabled variable and check that instead, since we do it multiple times.
| // Wrap status handler with OTEL tracing if enabled | ||
| if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled { | ||
| statusHandler = otelhttp.NewHandler( | ||
| http.HandlerFunc(statusHandler), | ||
| "status", | ||
| ).ServeHTTP | ||
| } |
There was a problem hiding this comment.
Not sure how useful it would be to trace the status page?
| // TLS is usually not typically required if running in a k8s environment | ||
| otlptracegrpc.WithInsecure(), |
There was a problem hiding this comment.
It might be worth adding a configuration setting to avoid this.
| Name: "otel.tracing.exporter_endpoint", | ||
| Usage: "The OTLP exporter endpoint for OpenTelemetry traces. Can also be set via OTEL_EXPORTER_OTLP_ENDPOINT env var.", | ||
| DefaultText: "empty, will use OTEL_EXPORTER_OTLP_ENDPOINT env var", | ||
| EnvVars: []string{"BAZEL_REMOTE_OTEL_TRACING_EXPORTER_ENDPOINT"}, |
There was a problem hiding this comment.
I don't think this code is checking these environment variables with the BAZEL_REMOTE_ prefix. I'm not sure if it's better to use standard variables, or to use custom ones.
| // Add OTEL interceptors first for complete trace coverage | ||
| if c.Otel != nil && c.Otel.Tracing != nil && c.Otel.Tracing.Enabled { | ||
| streamInterceptors = append(streamInterceptors, | ||
| otelgrpc.StreamServerInterceptor()) |
There was a problem hiding this comment.
These deprecated functions should be avoided.
| // Wrap transport with OTEL instrumentation if enabled | ||
| var transport http.RoundTripper = tr | ||
| if otelEnabled { | ||
| transport = otelhttp.NewTransport(tr) |
There was a problem hiding this comment.
Why not tr = otelhttp.NewTransport(tr) ?
Addresses #684
This PR attempts to add initial support for Opentelemetry using https://github.com/open-telemetry/opentelemetry-go:
We don't have spans for filesystem calls - although we can add them. But I don't think it's needed for that disk/IO metrics can usually help in that situation.
Testing: Some unit tests, I'll share some details once I am able to verify end-to-end after building it here and testing with otel-collector. Below is an example of a trace I generated by running it locally.
for bystream fetch

for remote asset api
