Skip to content

Conversation

@mjameswh
Copy link
Contributor

What was changed

When multiple LAs run concurrently (i.e. started from the same Workflow Activation), Core records LA markers in the order that the activities completed, and queue up lang jobs in that same order, which is the expected behavior.

However, on replay, Core would previously queue LA completion jobs in the order that the LAs are started by the workflow, which could have resulted in NDEs on replay in the rare case where other commands are scheduled by the workflow on completion of each LA independently (i.e. rather than on completion of all the LAs). See this issue for example.

This incorrect behavior is that we'd previously look ahead for an LA completion on creation, immediately switching to the ReplayingPreResolved state in such case, which would record the LA completion job.

This PR fixes this behavior by:

  1. In replay, LAs are now created in the WaitingResolveFromMarkerLookAhead state (i.e. they can't be "preresolved");
  2. The LA lookahead buffer is now structured as an ordered queue rather than a Map<seq_number, result>;
  3. LA lookaheads are consumed and applied to state machines in their historical order, once all activation commands have been decoded;
  4. Consuming/applying LA completions blocks when we reach a completion marker for which the LA state machine does not yet exists.
  5. If LA completions are left in the queue at the end of the WFT, that means that some LA that should have been executed by the workflow code, which is reported as an NDE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant