Skip to content

Fix the nullPointer exception when startTime is null#270

Closed
marcosgopen wants to merge 1 commit intojbosstm:mainfrom
marcosgopen:coordinatorFix
Closed

Fix the nullPointer exception when startTime is null#270
marcosgopen wants to merge 1 commit intojbosstm:mainfrom
marcosgopen:coordinatorFix

Conversation

@marcosgopen
Copy link
Member

Fix the nullPointer exception when startTime is null

@marcosgopen marcosgopen requested a review from mmusgrov February 24, 2026 14:54
public LRAData getLRAData() {
return new LRAData(id, clientId, status, isTopLevel(), isRecovering(),
startTime.toInstant(ZoneOffset.UTC).toEpochMilli(),
startTime == null ? 0L : startTime.toInstant(ZoneOffset.UTC).toEpochMilli(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If startTime is null then the action was never started so filling in the time is misleading and according to the LRAStatus model it doesn't really exist. It's better to track down the circumstances under which this condition can occur and fix it there. The creation of the LRA and starting it should effectively be a single action, for example in LRAService#startLRA the code that creates and starts the LRA should be inside a try catch block if that is the source of the error. There may be other places to check as well but, off the top of my head, LRAService is the only place where we create LRAs.

Copy link
Member

@mmusgrov mmusgrov Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how you managed to create the LRA without a status and/or a start time.

Copy link
Member Author

@marcosgopen marcosgopen Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see the nullpointer exception running the quarkus microprofile-tck. It occurs with the completionStage. I will reproduce it and link the error message here to better identify the case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message would be reported long after the area of the code where it failed to set a start time. If we can't track the cause then I think it would be better to leave the startTime as null and report it as such in the getLRAData call. Can we at least log what it thinks the status of the LRA is plus logging from any other parts of the code where we think the cause is? I would certainly suggest that the bit of code where we create and then start the LRA is enclosed in a try catch block.

@marcosgopen
Copy link
Member Author

marcosgopen commented Feb 25, 2026

I found the reproducer @mmusgrov :

git clone https://github.com/jbosstm/lra-coordinator-quarkus
cd lra-coordinator-quarkus
mvn clean install 
java -jar ../target/quarkus-app/quarkus-run.jar &

then in another folder clone quarkus, build it (if necessary) and run ' ./mvnw clean verify -pl tcks/microprofile-lra -Dtcks -Dlra.coordinator.url=http://localhost:8080/lra-coordinator'

In the lra-coordinator log you will see the server log:

....
2026-02-25 09:43:21,563 ERROR [io.quarkus.vertx.http.runtime.QuarkusErrorHandler] (executor-thread-1) HTTP Request to /lra-coordinator failed, error id: f555ded9-999f-479f-b911-729bf56bfbda-1: java.lang.NullPointerException: Cannot invoke "java.time.LocalDateTime.toInstant(java.time.ZoneOffset)" because "this.startTime" is null
	at io.narayana.lra.coordinator.domain.model.LongRunningAction.getLRAData(LongRunningAction.java:145)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:214)
	at java.base/java.util.concurrent.ConcurrentHashMap$ValueSpliterator.forEachRemaining(ConcurrentHashMap.java:3628)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
	at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:723)
	at io.narayana.lra.coordinator.domain.service.LRAService.getAll(LRAService.java:129)
	at io.narayana.lra.coordinator.api.Coordinator.getAllLRAs(Coordinator.java:169)
	at io.narayana.lra.coordinator.api.Coordinator_ClientProxy.getAllLRAs(Unknown Source)
	at io.narayana.lra.coordinator.api.Coordinator$quarkusrestinvoker$getAllLRAs_736b44a24bd95dd375aecb1a8fe270b2bdff659a.invoke(Unknown Source)
	at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
	at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:190)
	at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:147)
	at io.quarkus.vertx.core.runtime.VertxCoreRecorder$15.runWith(VertxCoreRecorder.java:666)
	at org.jboss.threads.EnhancedQueueExecutor$Task.doRunWith(EnhancedQueueExecutor.java:2651)
	at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2630)
	at org.jboss.threads.EnhancedQueueExecutor.runThreadBody(EnhancedQueueExecutor.java:1622)
	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1589)
	at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:11)
	at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:11)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1474)
...

During the tcks tests the error log shows up 19 times but the tests are successful actually.
If you want I can investigate why some LRAs are created without the startTime being initialiazed

@mmusgrov
Copy link
Member

mmusgrov commented Feb 25, 2026

If you want I can investigate why some LRAs are created without the startTime being initialiazed

There's definitely a bug somewhere and artificially setting the start time even though the LRA never started is missing an opportunity to track it down and fix it.

If it never actually started then it won't be in any store and the user will never be able to do anything with it.

As a starting point maybe add some debug to LongRunningAction.getLRAData to see what the coordinator actually knows about the LRA, that might shed light on where the bug first occured. Debug whether it was ever persisted to the store to exclude the possibility that the save and restore logic is faulty. The save and restore code certainly supports the case where start time can be null but I'm still interested in which code paths leave the startTime null.

@marcosgopen
Copy link
Member Author

marcosgopen commented Feb 25, 2026

So I am putting this PR on hold as we want to better identify what happens exactly in those test cases.
I am testing TckContextTests#testAfterLRAEnlistmentDuringClosingPhase which produces (together with other tests) the nullPointerException described above.
[Update] It happens at 'lraTestService.waitForRecovery(lra);', so the lra being recovered is not in a consistent state.

@mmusgrov
Copy link
Member

mmusgrov commented Feb 25, 2026

So I am putting this PR on hold as we want to better identify what happens exactly in those test cases. I am testing TckContextTests#testAfterLRAEnlistmentDuringClosingPhase which produces (together with other tests) the nullPointerException described above. [Update] It happens at 'lraTestService.waitForRecovery(lra);', so the lra being recovered is not in a consistent state.

I will put aside some time tomorrow to run it in a debugger.

@marcosgopen
Copy link
Member Author

After cleaning my local ObjectStore and retesting the issue doesn't occur anymore. I suppose I had a corrupted lra. So I am closing this PR. Thanks @mmusgrov for verifying it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants