Skip to content

Improve distributed tracing instrumentation coverage #9701

@waveywaves

Description

@waveywaves

Feature request

After the OpenCensus to OpenTelemetry migration (#9043), the tracing infrastructure is solid but the instrumentation coverage has significant gaps. This umbrella issue tracks the work to bring tracing from ~40% coverage to comprehensive end-to-end observability.

Completed

Remaining work

Medium effort:

  • Record errors on trace spans in reconcilers - ~30 error paths across TaskRun and PipelineRun reconcilers only log errors without calling span.RecordError() or span.SetStatus(Error). Traces show all spans as OK even when runs fail.
  • Add tracing to resolver framework - git, hub, cluster, and HTTP resolvers have zero OTel instrumentation. When a TaskRun sits in ResolvingTaskRef for 30 seconds, there is no trace visibility into why.
  • Add spans to cancelPipelineRun and timeoutPipelineRun - these standalone functions patch N child resources in sequence with zero trace visibility.

Large effort:

  • Add tracing to entrypoint step execution - The entrypoint binary (cmd/entrypoint/) has zero OTel instrumentation. Step execution (waiting, running commands, collecting results) is a complete gap in traces. Requires trace context injection via pod environment variables, OTel SDK initialization in the entrypoint, and a span export path.

Minor improvements:

  • Add span attributes for outcome (status, failure reason, step count)
  • Link metrics to traces via exemplars
  • Remove duplicate root span pattern (initTracing root + ReconcileKind root)

Use case

As a platform operator running Tekton in production, I want comprehensive distributed traces so that when a PipelineRun takes longer than expected, I can use Jaeger/Tempo to identify exactly which stage (resolution, pod creation, step execution, result extraction) is the bottleneck. Currently, traces show controller-level decisions but are blind to the data plane (step execution) and resolution pipeline.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions