Skip to content

Conversation

LikeTheSalad
Copy link
Contributor

Fixes: open-telemetry/opentelemetry-android#1134

Supersedes #7557

Summary

This is to avoid the following crash from happening when a grpc exporter is shut down while okhttp is waiting to establish a connection with the server.

java.lang.InterruptedException
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1765)
	at java.util.concurrent.LinkedBlockingDeque.pollFirst(LinkedBlockingDeque.java:515)
	at java.util.concurrent.LinkedBlockingDeque.poll(LinkedBlockingDeque.java:677)
	at okhttp3.internal.connection.FastFallbackExchangeFinder.awaitTcpConnect(FastFallbackExchangeFinder.kt:162)
	at okhttp3.internal.connection.FastFallbackExchangeFinder.find(FastFallbackExchangeFinder.kt:69)
	at okhttp3.internal.connection.RealCall.initExchange$okhttp(RealCall.kt:280)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.kt:32)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:126)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.kt:101)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:126)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.kt:85)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:126)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.kt:74)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:126)
	at io.opentelemetry.exporter.sender.okhttp.internal.RetryInterceptor.intercept(RetryInterceptor.java:96)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.kt:126)
	at okhttp3.internal.connection.RealCall.getResponseWithInterceptorChain$okhttp(RealCall.kt:208)
	at okhttp3.internal.connection.RealCall$AsyncCall.run(RealCall.kt:530)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:651)
	at java.lang.Thread.run(Thread.java:1119)

Description

The crash happens when okhttp gets an unhandled interrupted exception here which propagates to the host app.

The proposed solution is to avoid interrupting the thread, so that the execution will return here after a timeout, and then finish here as the call is cancelled prior to attempting to shut down the executor.

Alternative solution

Asking okhttp to handle possible java.lang.InterruptedExceptions here.

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 90.01%. Comparing base (4b8be80) to head (a27bf65).
⚠️ Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
...pentelemetry/sdk/internal/DaemonThreadFactory.java 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #7565      +/-   ##
============================================
- Coverage     90.02%   90.01%   -0.01%     
- Complexity     7080     7082       +2     
============================================
  Files           803      803              
  Lines         21417    21429      +12     
  Branches       2086     2087       +1     
============================================
+ Hits          19280    19289       +9     
- Misses         1475     1477       +2     
- Partials        662      663       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bidetofevil
Copy link

An alternative to this, if you want to restrict it to the Android use case, is to use an UncaughtExceptionHandler in the app to suppress these exceptions, similar to how the Crash reporting UncaughtExceptionHandler is used. You'll want to wrap the latter with the former, but you get what I mean.

clearInvocations(threadMock, defaultHandler);
IllegalStateException e = new IllegalStateException();
uncaughtExceptionHandler.uncaughtException(threadMock, e);
verify(defaultHandler).uncaughtException(threadMock, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doing all assertions/verifications in an async block when the main test body doesn't also verify the execution is a code smell.

I think there could be something at the end of the Runnable (AtomicBoolean, CountdownLatch, etc) that you can then verify in the main thread. That way the reader can easily determine that the runnable completed which means the assertions all ran and passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I've added some changes to address this, cheers!

Copy link
Contributor

@breedx-splk breedx-splk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I offered one small improvement idea in the test.

@LikeTheSalad
Copy link
Contributor Author

An alternative to this, if you want to restrict it to the Android use case, is to use an UncaughtExceptionHandler in the app to suppress these exceptions, similar to how the Crash reporting UncaughtExceptionHandler is used. You'll want to wrap the latter with the former, but you get what I mean.

I see what you mean, that'll work too. Though whenever possible, I prefer to set the uncaught exception handler to the specific thread that we handle, to make the solution more targeted to this problem, rather than setting it to the main thread or as the default one, to avoid potential conflicts with other uncaught exception handlers that might be set for other reasons, as we can't control in which order they'll be set and whether all of them will be "good citizens" and wrap/delegate to the previous one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

App crashes after some period of downtime of the otel endpoint
3 participants