Skip to content

fix NPE when exception message is null#280

Merged
shakuzen merged 2 commits intoopenzipkin:masterfrom
joaovieira-ca:fix-NPE-CancellationException
May 26, 2025
Merged

fix NPE when exception message is null#280
shakuzen merged 2 commits intoopenzipkin:masterfrom
joaovieira-ca:fix-NPE-CancellationException

Conversation

@joaovieira-ca
Copy link
Copy Markdown
Contributor

When the Exception is an IllegalStateException and does not have a message, it is causing an NPE like we were having in our projects

java.lang.NullPointerException: Cannot invoke "String.equals(Object)" because the return value of "java.lang.Throwable.getMessage()" is null
	at zipkin2.reporter.internal.AsyncReporter$BoundedAsyncReporter.flush(AsyncReporter.java:294)
	at zipkin2.reporter.internal.AsyncReporter$Flusher.run(AsyncReporter.java:352)
	at java.base/java.lang.Thread.run(Unknown Source)

In our case, it was a java.util.concurrent.CancellationException

I am not sure because I cannot prove, but after we get this error, our pods stop exporting spans to zipkin. We loose all the observability after this NPE.

PS: our stack is: Java 21, Spring boot 3.4.5 with 'io.micrometer:micrometer-tracing-bridge-brave', 'io.zipkin.reporter2:zipkin-reporter-brave'

@joaovieira-ca
Copy link
Copy Markdown
Contributor Author

Can you release a patch version after this is merged? We are facing this issue in production and would like to update to check if it is going to work well.

@joaovieira-ca joaovieira-ca force-pushed the fix-NPE-CancellationException branch from 0896ccd to 9f2dbcd Compare May 23, 2025 22:38
Copy link
Copy Markdown
Member

@shakuzen shakuzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a good change to make regardless. Though I am curious what is causing the CancellationException. Is it a timeout on the sender you're using?

@shakuzen shakuzen merged commit 4b765b3 into openzipkin:master May 26, 2025
3 checks passed
@shakuzen
Copy link
Copy Markdown
Member

Are you able to try with 3.5.1-SNAPSHOT to make sure this completely fixes the issue you were seeing before we make a new release? Are you able to reproduce the issue outside of production?

@joaovieira-ca
Copy link
Copy Markdown
Contributor Author

@shakuzen, sorry for the late reply, I was travelling.

As for the reason for the CancellationException, I don't have proof of the problem, as the stack trace does not help. My speculation is that due to the number of spans being exported, it was peaking and generating some kind of timeout, and the exporter thread got cancelled as we can see that the thread was AsyncReporter{ZipkinRestTemplate Sender{http://zipkin:9411/api/v2/spans}}

image

Another speculation of mine is that after the NPE, the thread was stopped and not started again, so the service stopped exporting spans. But that is a very wild speculation, as I don't really understand how the exporter behaves under the hood.

The version 3.5.1-SNAPSHOT was added yesterday to the service, which was most impacted, and until now we don't see any instance stopping the exporting, so while it may be too soon to tell, it seems it is a lot better.

Thank you very much for your fast approval on this PR!

@shakuzen
Copy link
Copy Markdown
Member

the thread was stopped and not started again, so the service stopped exporting spans.

Yes, that's the behavior that will happen with this kind of exception. See this part of the code.

Thanks for trying it out. I'll try to work on a release today.

@shakuzen
Copy link
Copy Markdown
Member

3.5.1 is available in Maven Central now with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants