Skip to content

fix(async-processor): concurrent exports actually serialised #3028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 14, 2025

Conversation

alexbrt
Copy link
Contributor

@alexbrt alexbrt commented Jun 18, 2025

#2685 unintentionally broke parallel exports by awaiting the export() future directly in opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs, rather than passing it to the runtime for concurrent polling. As a result, OTEL_BSP_MAX_CONCURRENT_EXPORTS became ineffective, serialising all exports and increasing the risk of dropped spans under load.

This PR restores true parallelism by respecting max_concurrent_exports, and adds tests to verify:

  • Exports run in parallel when max_concurrent_exports > 1
  • Exports are serialised when max_concurrent_exports == 1

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@alexbrt alexbrt requested a review from a team as a code owner June 18, 2025 14:19
Copy link

linux-foundation-easycla bot commented Jun 18, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link

codecov bot commented Jun 18, 2025

Codecov Report

Attention: Patch coverage is 94.73684% with 5 lines in your changes missing coverage. Please review.

Project coverage is 80.2%. Comparing base (5e447d0) to head (8346e30).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...sdk/src/trace/span_processor_with_async_runtime.rs 94.7% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3028     +/-   ##
=======================================
+ Coverage   80.0%   80.2%   +0.1%     
=======================================
  Files        126     126             
  Lines      21879   21949     +70     
=======================================
+ Hits       17519   17604     +85     
+ Misses      4360    4345     -15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@alexbrt alexbrt force-pushed the alxbrt/concurrent-exports branch 2 times, most recently from 290d3be to 06fa8ae Compare June 18, 2025 14:36
@lalitb lalitb requested a review from Copilot June 18, 2025 20:11
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR restores true parallel exports by scheduling export() futures on the async runtime instead of awaiting them directly, and adds tests to verify behavior under different max_concurrent_exports settings.

  • Refactored export calls to use a static async function with RwLock-wrapped exporter and FuturesUnordered for concurrency.
  • Updated shutdown and resource-setting to acquire a write lock on the exporter.
  • Added tokio-based tests (TrackingExporter) to ensure exports are parallel when allowed and serialized when limited.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
opentelemetry-sdk/src/trace/span_processor_with_async_runtime.rs Refactor export logic for concurrent scheduling and lock-protect the exporter; update shutdown and set_resource; add concurrency tests.
opentelemetry-sdk/Cargo.toml Include the rt-tokio feature in experimental_trace_batch_span_processor_with_async_runtime.

@alexbrt alexbrt force-pushed the alxbrt/concurrent-exports branch from 06fa8ae to 1905b50 Compare June 18, 2025 20:15
@@ -188,13 +190,19 @@ struct BatchSpanProcessorInternal<E, R> {
spans: Vec<SpanData>,
export_tasks: FuturesUnordered<BoxFuture<'static, OTelSdkResult>>,
runtime: R,
exporter: E,
exporter: Arc<RwLock<E>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going forward, the SpanExporter trait should be redesigned to use immutable references (&self) for all methods. This would allow us to remove the RwLock and use just Arc<E> for sharing the exporter across concurrent tasks - similar to how LogExporter is implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — my goal was mainly to minimise the scope of the change, but you're right that using a RwLock is... questionable.

Copy link
Member

@lalitb lalitb Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment for this so someone can work on it? I'll also file an issue. Thanks.

@alexbrt alexbrt requested a review from lalitb June 19, 2025 13:22
@lalitb lalitb self-assigned this Jun 19, 2025
Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for adding tests for concurrent export.

@@ -188,13 +190,19 @@ struct BatchSpanProcessorInternal<E, R> {
spans: Vec<SpanData>,
export_tasks: FuturesUnordered<BoxFuture<'static, OTelSdkResult>>,
runtime: R,
exporter: E,
exporter: Arc<RwLock<E>>,
Copy link
Member

@lalitb lalitb Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a TODO comment for this so someone can work on it? I'll also file an issue. Thanks.

Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this and fixing! Can you add a changelog entry too?

(Concurrent Export support is something we need to support everywhere, and is pending some spec-level discussions. But given this functionality was already there and got removed unintentionally, no problem adding it back)

@alexbrt alexbrt requested a review from cijothomas June 19, 2025 22:01
@alexbrt
Copy link
Contributor Author

alexbrt commented Jun 24, 2025

@lalitb @cijothomas Thanks again for all the reviews! 🙏 Is there anything else outstanding that needs to be addressed before this can be merged?

@alexbrt alexbrt requested a review from lalitb July 1, 2025 17:03
@alexbrt
Copy link
Contributor Author

alexbrt commented Jul 11, 2025

Any updates here? 👀

Copy link
Contributor

@TommyCpp TommyCpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redesign the SpanExporter trait to use immutable references (&self)

OK to merge this PR as it is but we should fix this before next release so we don't cause a preformance regression

// TODO: Redesign the `SpanExporter` trait to use immutable references (`&self`)
// for all methods. This would allow us to remove the `RwLock` and just use `Arc<E>`,
// similar to how `crate::logs::LogExporter` is implemented.
exporter: Arc<RwLock<E>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emmm if we need to add RwLock to support concurrent export maybe we should consider offer two favor of span processor so users that don't use concurrent export don't have to pay the cost

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. RwLock (and so the dependency on tokio/sync) should be removed in subsequent iteration. Will create a tracking issue for same.

EDIT: issue #3065

@lalitb lalitb merged commit 0631070 into open-telemetry:main Jul 14, 2025
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants