Skip to content

Conversation

scottgerring
Copy link
Member

@scottgerring scottgerring commented Aug 12, 2025

Fixes #3081, building on the work started by @AaronRM 🤝

Changes

A new retry module added to opentelemetry-sdk

Models the sorts of retry an operation may request (retry / can't retry / throttle), and provides a helper retry_with_backoff mechanism that can be used to wrap up a retryable operation and retry it. The helper relies on experimental_async_runtime for its runtime abstraction, to provide the actual pausing. It also takes a lambda to classify the error, so the caller can inform the retry mechanism if a retry is required.

A new retry_classification module added to opentelemetry-otlp

This bit takes the actual error responses that we get back over OTLP and maps them back to the retry model. Because this is OTLP-specific stuff it belongs here rather than alongside the retry code.

Retry binding

... happens in each one of the concrete exporters to tie it all together.

Also ...

  • Extended exporter builders to allow the user to customise default retry policy
  • Added new feature flags experimental-http-retry and experimental-grpc-retry which pull in the experimental-async-runtime dep and set everything up. This way we can get going with this now without having to stabilise the experimental-async-runtime feature.

Open Questions

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

Copy link

codecov bot commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 74.12077% with 780 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.8%. Comparing base (ad88615) to head (9ba3a06).
⚠️ Report is 222 commits behind head on main.

Files with missing lines Patch % Lines
opentelemetry-otlp/src/exporter/http/mod.rs 67.0% 162 Missing ⚠️
opentelemetry-sdk/src/metrics/data/mod.rs 13.4% 154 Missing ⚠️
opentelemetry-proto/src/transform/metrics.rs 11.1% 64 Missing ⚠️
...-sdk/src/metrics/internal/exponential_histogram.rs 65.1% 52 Missing ⚠️
opentelemetry-otlp/src/exporter/tonic/metrics.rs 0.0% 50 Missing ⚠️
opentelemetry-otlp/src/exporter/tonic/trace.rs 0.0% 48 Missing ⚠️
opentelemetry-otlp/src/exporter/tonic/logs.rs 0.0% 46 Missing ⚠️
opentelemetry-otlp/src/exporter/tonic/mod.rs 73.4% 42 Missing ⚠️
opentelemetry-sdk/src/metrics/instrument.rs 88.9% 29 Missing ⚠️
opentelemetry-sdk/src/logs/logger_provider.rs 92.0% 12 Missing ⚠️
... and 28 more
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3126     +/-   ##
=======================================
+ Coverage   79.6%   80.8%   +1.2%     
=======================================
  Files        124     128      +4     
  Lines      23174   23090     -84     
=======================================
+ Hits       18456   18676    +220     
+ Misses      4718    4414    -304     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@scottgerring scottgerring force-pushed the feat/retry-logic branch 4 times, most recently from 3847b26 to fb141db Compare August 12, 2025 10:22
@scottgerring scottgerring changed the title [not ready!] feat: support backoff/retry feat: support backoff/retry in OTLP Aug 12, 2025
@scottgerring scottgerring marked this pull request as ready for review August 19, 2025 14:32
@scottgerring scottgerring requested a review from a team as a code owner August 19, 2025 14:32
@lalitb lalitb requested a review from Copilot September 1, 2025 18:50
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements retry logic with exponential backoff and jitter for OTLP exporters to handle transient failures gracefully, addressing issue #3081. The implementation supports both HTTP and gRPC protocols with protocol-specific error classification and server-provided throttling hints.

  • Adds a new retry module to opentelemetry-sdk with configurable retry policies and exponential backoff
  • Implements protocol-specific error classification in opentelemetry-otlp for HTTP and gRPC responses
  • Integrates retry functionality into all OTLP exporters (traces, metrics, logs) for both HTTP and gRPC transports

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
opentelemetry-sdk/src/retry.rs Core retry module with exponential backoff, jitter, and error classification
opentelemetry-otlp/src/retry_classification.rs Protocol-specific error classification for HTTP and gRPC responses
opentelemetry-otlp/src/exporter/tonic/*.rs gRPC exporter integration with retry functionality
opentelemetry-otlp/src/exporter/http/*.rs HTTP exporter integration with retry functionality
opentelemetry-otlp/Cargo.toml Feature flags and dependencies for retry support

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@scottgerring scottgerring force-pushed the feat/retry-logic branch 2 times, most recently from af933a2 to f1636a0 Compare September 2, 2025 09:11
Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the HTTP exporters look good now. Love all those red lines.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 for the HTTP code. I can't see a clear way to reuse more of the Tonic code.

@lalitb lalitb self-assigned this Sep 16, 2025
@lalitb
Copy link
Member

lalitb commented Sep 16, 2025

Sorry for delay. I would like to review during this week - assigning to myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTLP Stabilization: Throttling & Retry

4 participants