Skip to content

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Jul 30, 2025

Fixes #309
Design discussion issue (if applicable) #

Changes

This PR introduces a new retry module and integrates it into the Geneva uploader to handle transient failures gracefully. The implementation provides configurable retry logic with support for retriability checking, while also including TODOs for future enhancements like exponential backoff and jitter.

  1. New Retry Module (retry.rs):

    • RetryConfig struct with configurable max retries and delay
    • retry_with_config() - Simple retry function with fixed delays
    • retry_with_config_and_check() - Advanced retry with retriability checking
    • TODOs for future exponential backoff and jitter support
  2. Uploader Integration (uploader.rs):

    • Added retry_config to GenevaUploaderConfig
    • Implemented is_retriable() method on GenevaUploaderError to intelligently determine which errors should trigger retries
    • Modified upload() method to use retry_with_config_and_check() for automatic retries.
  3. Client Integration (client.rs):

    • Added optional retry_config field to GenevaClientConfig
    • Passes retry configuration to the uploader during initialization
    • Uses default retry configuration if none is provided
  4. Error Retriability Logic: The is_retriable() method implements smart error classification:

    • Network failures (timeouts, connection refused, etc.) are retried as they're often transient
    • Server errors (5xx) and rate limiting (429) trigger retries
    • Config service errors are retried (may involve token refresh)
    • Client errors (parsing, internal logic) don't waste retry attempts
  5. The retry wrapper is designed to have minimal performance impact: When operations succeed on the first attempt (the common case), the retry wrapper adds only:

    • A single function call overhead
    • One error type check (is_retriable() is never called on success)
    • No heap allocations or cloning
    • The main performance cost comes only when retries actually occur (during failures), where the configured delay is intentionally added between attempts.
  6. Unit tests added to test retriable and non-retriable uploads.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@lalitb lalitb requested a review from a team as a code owner July 30, 2025 21:27
Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 95.43860% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.3%. Comparing base (e3456b0) to head (e5a913d).
⚠️ Report is 19 commits behind head on main.

Files with missing lines Patch % Lines
.../geneva-uploader/src/ingestion_service/uploader.rs 69.4% 11 Missing ⚠️
...metry-exporter-geneva/geneva-uploader/src/retry.rs 97.6% 6 Missing ⚠️
...metry-exporter-geneva/geneva-uploader/src/bench.rs 0.0% 4 Missing ⚠️
stress/src/geneva_exporter.rs 0.0% 3 Missing ⚠️
...etry-exporter-geneva/geneva-uploader/src/client.rs 0.0% 1 Missing ⚠️
...eneva/geneva-uploader/src/ingestion_service/mod.rs 99.6% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #399     +/-   ##
=======================================
+ Coverage   47.6%   51.3%   +3.7%     
=======================================
  Files         69      70      +1     
  Lines       9729   10290    +561     
=======================================
+ Hits        4639    5289    +650     
+ Misses      5090    5001     -89     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@utpilla
Copy link
Contributor

utpilla commented Aug 5, 2025

Changes

This PR introduces a new retry module and integrates it into the Geneva uploader to handle transient failures gracefully.

We don't want this in uploader, right? I thought we wanted retrying, persistence etc. to be handled by the consumers of the uploader such as the exporter.

@lalitb
Copy link
Member Author

lalitb commented Aug 5, 2025

We don't want this in uploader, right? I thought we wanted retrying, persistence etc. to be handled by the consumers of the uploader such as the exporter.

I thought it can be hybrid approach - transient (Network timeouts, 5xx errors) should be still handled by uploader. And for network offline, or bigger failure the consumer can persist and keep retrying. If required, we can provide configuration to disable the retry too, so consumer can decide if they don't want it.

@lalitb
Copy link
Member Author

lalitb commented Sep 2, 2025

closing this for #442

@lalitb lalitb closed this Sep 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Geneva Exporter] Uploader should handle retriable errors

2 participants