-
Notifications
You must be signed in to change notification settings - Fork 598
Add Retry to OTLP Exporter #2727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 4 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
9e4a5f4
Fix comment about how to only run unit tests
AaronRM e1c406a
Remove extraneous lines from test_with_gzip_compression unit test
AaronRM 12f6f6d
Add retry_with_exponential_backoff method; Use in tonic logs client
AaronRM 98469f3
Add tests and comments to retry.rs
AaronRM b2dd5e7
Move retry to opentelemetry-sdk (WIP)
AaronRM 868c44b
Merge branch 'main' into aaronm-retry
AaronRM 26bb2c2
Fix build warning
AaronRM 16dd9a5
Scope retry to just logs+tonic
AaronRM File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,147 @@ | ||||||
| use std::future::Future; | ||||||
| use std::time::{Duration, SystemTime}; | ||||||
| use opentelemetry::otel_warn; | ||||||
| use tokio::time::sleep; | ||||||
|
|
||||||
| pub(crate) struct RetryPolicy { | ||||||
| pub max_retries: usize, | ||||||
| pub initial_delay_ms: u64, | ||||||
| pub max_delay_ms: u64, | ||||||
| pub jitter_ms: u64, | ||||||
| } | ||||||
|
|
||||||
| // Generates a random jitter value up to max_jitter | ||||||
| fn generate_jitter(max_jitter: u64) -> u64 { | ||||||
| let now = SystemTime::now(); | ||||||
| let nanos = now.duration_since(SystemTime::UNIX_EPOCH).unwrap().subsec_nanos(); | ||||||
| nanos as u64 % (max_jitter + 1) | ||||||
| } | ||||||
|
|
||||||
| // Retries the given operation with exponential backoff and jitter | ||||||
| pub(crate) async fn retry_with_exponential_backoff<F, Fut, T, E>( | ||||||
| policy: RetryPolicy, | ||||||
| operation_name: &str, | ||||||
| mut operation: F, | ||||||
| ) -> Result<T, E> | ||||||
| where | ||||||
| F: FnMut() -> Fut, | ||||||
| E: std::fmt::Debug, | ||||||
| Fut: Future<Output = Result<T, E>>, | ||||||
| { | ||||||
| let mut attempt = 0; | ||||||
| let mut delay = policy.initial_delay_ms; | ||||||
|
|
||||||
| loop { | ||||||
| match operation().await { | ||||||
| Ok(result) => return Ok(result), // Return the result if the operation succeeds | ||||||
| Err(err) if attempt < policy.max_retries => { | ||||||
| attempt += 1; | ||||||
| // Log the error and retry after a delay with jitter | ||||||
| otel_warn!(name: "OtlpRetry", message = format!("Retrying operation {:?} due to error: {:?}", operation_name, err)); | ||||||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| let jitter = generate_jitter(policy.jitter_ms); | ||||||
| let delay_with_jitter = std::cmp::min(delay + jitter, policy.max_delay_ms); | ||||||
| sleep(Duration::from_millis(delay_with_jitter)).await; | ||||||
| delay = std::cmp::min(delay * 2, policy.max_delay_ms); // Exponential backoff | ||||||
| } | ||||||
| Err(err) => return Err(err), // Return the error if max retries are reached | ||||||
| } | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| #[cfg(test)] | ||||||
| mod tests { | ||||||
| use super::*; | ||||||
| use tokio::time::timeout; | ||||||
| use std::sync::atomic::{AtomicUsize, Ordering}; | ||||||
| use std::time::Duration; | ||||||
|
|
||||||
| // Test to ensure generate_jitter returns a value within the expected range | ||||||
| #[tokio::test] | ||||||
| async fn test_generate_jitter() { | ||||||
| let max_jitter = 100; | ||||||
| let jitter = generate_jitter(max_jitter); | ||||||
| assert!(jitter <= max_jitter); | ||||||
| } | ||||||
|
|
||||||
| // Test to ensure retry_with_exponential_backoff succeeds on the first attempt | ||||||
| #[tokio::test] | ||||||
| async fn test_retry_with_exponential_backoff_success() { | ||||||
| let policy = RetryPolicy { | ||||||
| max_retries: 3, | ||||||
| initial_delay_ms: 100, | ||||||
| max_delay_ms: 1600, | ||||||
| jitter_ms: 100, | ||||||
| }; | ||||||
|
|
||||||
| let result = retry_with_exponential_backoff(policy, "test_operation", || { | ||||||
| Box::pin(async { Ok::<_, ()>("success") }) | ||||||
| }).await; | ||||||
|
|
||||||
| assert_eq!(result, Ok("success")); | ||||||
| } | ||||||
|
|
||||||
| // Test to ensure retry_with_exponential_backoff retries the operation and eventually succeeds | ||||||
| #[tokio::test] | ||||||
| async fn test_retry_with_exponential_backoff_retries() { | ||||||
| let policy = RetryPolicy { | ||||||
| max_retries: 3, | ||||||
| initial_delay_ms: 100, | ||||||
| max_delay_ms: 1600, | ||||||
| jitter_ms: 100, | ||||||
| }; | ||||||
|
|
||||||
| let attempts = AtomicUsize::new(0); | ||||||
|
|
||||||
| let result = retry_with_exponential_backoff(policy, "test_operation", || { | ||||||
| let attempt = attempts.fetch_add(1, Ordering::SeqCst); | ||||||
| Box::pin(async move { | ||||||
| if attempt < 2 { | ||||||
| Err::<&str, &str>("error") // Fail the first two attempts | ||||||
| } else { | ||||||
| Ok::<&str, &str>("success") // Succeed on the third attempt | ||||||
| } | ||||||
| }) | ||||||
| }).await; | ||||||
|
|
||||||
| assert_eq!(result, Ok("success")); | ||||||
| assert_eq!(attempts.load(Ordering::SeqCst), 3); // Ensure there were 3 attempts | ||||||
| } | ||||||
|
|
||||||
| // Test to ensure retry_with_exponential_backoff fails after max retries | ||||||
| #[tokio::test] | ||||||
| async fn test_retry_with_exponential_backoff_failure() { | ||||||
| let policy = RetryPolicy { | ||||||
| max_retries: 3, | ||||||
| initial_delay_ms: 100, | ||||||
| max_delay_ms: 1600, | ||||||
| jitter_ms: 100, | ||||||
| }; | ||||||
|
|
||||||
| let attempts = AtomicUsize::new(0); | ||||||
|
|
||||||
| let result = retry_with_exponential_backoff(policy, "test_operation", || { | ||||||
| attempts.fetch_add(1, Ordering::SeqCst); | ||||||
| Box::pin(async { Err::<(), _>("error") }) // Always fail | ||||||
| }).await; | ||||||
|
|
||||||
| assert_eq!(result, Err("error")); | ||||||
| assert_eq!(attempts.load(Ordering::SeqCst), 4); // Ensure there were 4 attempts (initial + 3 retries) | ||||||
| } | ||||||
|
|
||||||
| // Test to ensure retry_with_exponential_backoff respects the timeout | ||||||
| #[tokio::test] | ||||||
| async fn test_retry_with_exponential_backoff_timeout() { | ||||||
| let policy = RetryPolicy { | ||||||
| max_retries: 12, // Increase the number of retries | ||||||
| initial_delay_ms: 100, | ||||||
| max_delay_ms: 1600, | ||||||
| jitter_ms: 100, | ||||||
| }; | ||||||
|
|
||||||
| let result = timeout(Duration::from_secs(1), retry_with_exponential_backoff(policy, "test_operation", || { | ||||||
| Box::pin(async { Err::<(), _>("error") }) // Always fail | ||||||
| })).await; | ||||||
|
|
||||||
| assert!(result.is_err()); // Ensure the operation times out | ||||||
| } | ||||||
| } | ||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokioruntime won't be available here. Should we move this code underexporter/tonic, or if we want to keep it generic, one option could be to make the delay function configurable as an argument toretry_with_exponential_backoff- such that retry function can be callable from both async and blocking code.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was the retry logic should be generic and used across all the OTLP exporters. Going to try that approach first and try to avoid the direct tokio dependency.