Skip to content

Conversation

@stefanobaghino
Copy link

@stefanobaghino stefanobaghino commented Nov 22, 2025

Fixes #2777

Changes

This follows the approach suggested here by @scottgerring:

The issue is in TonicLogsClient which uses tokio::sync::Mutex (async) but the LogExporter::shutdown_with_timeout() trait method is synchronous. This prevents calling .lock().await, so the current implementation has a TODO and just returns Ok() without actually shutting down the gRPC client.

Comparing shutdown on TonicMetricsClient , and in particular the type of the mutex used for the inner client and locked in the shutdown, seems to show a workable pattern.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 22, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@codecov
Copy link

codecov bot commented Nov 22, 2025

Codecov Report

❌ Patch coverage is 0% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.7%. Comparing base (df412fe) to head (4df15de).

Files with missing lines Patch % Lines
opentelemetry-otlp/src/exporter/tonic/logs.rs 0.0% 22 Missing ⚠️
opentelemetry-otlp/src/exporter/tonic/metrics.rs 0.0% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #3255     +/-   ##
=======================================
- Coverage   80.8%   80.7%   -0.1%     
=======================================
  Files        129     129             
  Lines      23203   23212      +9     
=======================================
  Hits       18750   18750             
- Misses      4453    4462      +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stefanobaghino stefanobaghino changed the title Handle shutdown in logs exporter fix: handle shutdown in logs exporter Nov 22, 2025
@stefanobaghino stefanobaghino marked this pull request as ready for review November 22, 2025 14:10
@stefanobaghino stefanobaghino requested a review from a team as a code owner November 22, 2025 14:10
Copy link
Member

@scottgerring scottgerring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

}
None => Err(tonic::Status::failed_precondition(
"exporter is already shut down",
"metrics exporter is already shut down",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is helpful; I don't think the user would necessarily see which failed otherwise

@scottgerring
Copy link
Member

@stefanobaghino that was a quick turnaround :D

use opentelemetry_sdk::error::{OTelSdkError, OTelSdkResult};
use opentelemetry_sdk::logs::{LogBatch, LogExporter};
use std::sync::Arc;
use std::sync::{Arc, Mutex};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any repercussions of moving to std Mutex instead of tokio one inside the async export call?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, and its the same pattern used in TonicsMetricsClient for the last couple years, which as far as I can tell we've not had problems with (linky to blame).

In the export, we hold the lock only so long as to clone the client and call the interceptor - and significantly, not across any await points, which is the main contra-indicator for std mutexes:

                let (mut client, metadata, extensions) = self
                    .inner
                    .lock()
                    .map_err(|e| tonic::Status::internal(format!("Failed to acquire lock: {e:?}")))
                    .and_then(|mut inner| match &mut *inner {
Some(inner) => {
let (m, e, _) = inner
.interceptor
.call(Request::new(()))
.map_err(|e| {
// Convert interceptor errors to tonic::Status for retry classification
tonic::Status::internal(format!("interceptor error: {e:?}"))
})? // lock released

I think if you called shutdown_with_timeout on a single-threaded tokio runtime while the runtime is still trying to do an export, and the export was at that exact moment in the code I highlighted above, then you could potentially deadlock it, but this relies on amazing timing, and I think we already warn against doing exactly this:

/// **Warning**: When using tokio's current-thread runtime, `shutdown()`, which
/// is a blocking call ,should not be called from your main thread. This can
/// cause deadlock. Instead, call `shutdown()` from a separate thread or use
/// tokio's `spawn_blocking`.

I note there are some other suggestions (e.g. spawning a thread) to do this particular thing in the linked issue, but this feels a bit overwrought.

What are you thinking @cijothomas ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OTLP Stabilization: Handle shutdown in OTLP/gRPC

3 participants