Fix the nullPointer exception when startTime is null#270
Fix the nullPointer exception when startTime is null#270marcosgopen wants to merge 1 commit intojbosstm:mainfrom
Conversation
| public LRAData getLRAData() { | ||
| return new LRAData(id, clientId, status, isTopLevel(), isRecovering(), | ||
| startTime.toInstant(ZoneOffset.UTC).toEpochMilli(), | ||
| startTime == null ? 0L : startTime.toInstant(ZoneOffset.UTC).toEpochMilli(), |
There was a problem hiding this comment.
If startTime is null then the action was never started so filling in the time is misleading and according to the LRAStatus model it doesn't really exist. It's better to track down the circumstances under which this condition can occur and fix it there. The creation of the LRA and starting it should effectively be a single action, for example in LRAService#startLRA the code that creates and starts the LRA should be inside a try catch block if that is the source of the error. There may be other places to check as well but, off the top of my head, LRAService is the only place where we create LRAs.
There was a problem hiding this comment.
Do you know how you managed to create the LRA without a status and/or a start time.
There was a problem hiding this comment.
I could see the nullpointer exception running the quarkus microprofile-tck. It occurs with the completionStage. I will reproduce it and link the error message here to better identify the case
There was a problem hiding this comment.
The error message would be reported long after the area of the code where it failed to set a start time. If we can't track the cause then I think it would be better to leave the startTime as null and report it as such in the getLRAData call. Can we at least log what it thinks the status of the LRA is plus logging from any other parts of the code where we think the cause is? I would certainly suggest that the bit of code where we create and then start the LRA is enclosed in a try catch block.
|
I found the reproducer @mmusgrov : then in another folder clone quarkus, build it (if necessary) and run ' ./mvnw clean verify -pl tcks/microprofile-lra -Dtcks -Dlra.coordinator.url=http://localhost:8080/lra-coordinator' In the lra-coordinator log you will see the server log: During the tcks tests the error log shows up 19 times but the tests are successful actually. |
There's definitely a bug somewhere and artificially setting the start time even though the LRA never started is missing an opportunity to track it down and fix it. If it never actually started then it won't be in any store and the user will never be able to do anything with it. As a starting point maybe add some debug to LongRunningAction.getLRAData to see what the coordinator actually knows about the LRA, that might shed light on where the bug first occured. Debug whether it was ever persisted to the store to exclude the possibility that the save and restore logic is faulty. The save and restore code certainly supports the case where start time can be null but I'm still interested in which code paths leave the startTime null. |
|
So I am putting this PR on hold as we want to better identify what happens exactly in those test cases. |
I will put aside some time tomorrow to run it in a debugger. |
|
After cleaning my local ObjectStore and retesting the issue doesn't occur anymore. I suppose I had a corrupted lra. So I am closing this PR. Thanks @mmusgrov for verifying it. |
Fix the nullPointer exception when startTime is null