fix(workflow): skip un-resumed items during nested batch/loop interrupt-resume#2644
Merged
shentongmartin merged 4 commits intomainfrom Mar 25, 2026
Merged
fix(workflow): skip un-resumed items during nested batch/loop interrupt-resume#2644shentongmartin merged 4 commits intomainfrom
shentongmartin merged 4 commits intomainfrom
Conversation
…e nodes When a sub-workflow inside a batch/loop node interrupts, the resume mechanism uses optsForIndexed (not toResumeIndexes). Add HasIndexedOpts and HasOptsForIndex checks so batch/loop skip un-resumed items correctly. Also fix tests to use realistic node configurations: - Lambda nodes now pass WithLambdaType for proper NodeType propagation - Test configs implement RequireCheckpoint for proper checkpoint enablement - Loop test mock uses single-entry interrupt state (matching real behavior)
zhuangjie1125
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When a batch/loop node contains a sub-workflow with an interruptible node, resuming one interrupted item causes the other (un-resumed) items to re-execute unnecessarily. The resume itself succeeds — the functional result is correct — but the re-execution of un-resumed items regenerates their sub-execute-IDs, breaking execution history.
Example:
batch[items: a, b] → sub_workflow → QA_node— both items interrupt. User resumes itema. Itembshould be skipped (it hasn't been resumed yet), but instead it re-runs and re-interrupts, generating a new sub-execute-ID. The execution history for itemb's original run is now orphaned.Root Cause
There are two option-passing paths for resume:
toResumeIndexes): Used when the interrupting node is a direct child of the batch/loopoptsForIndexed): Used when the interrupting node is inside a sub-workflow within the batch/loopThe batch/loop skip logic only checked path (1). When resume came through path (2), un-resumed items weren't recognized as "should skip" — they re-executed, re-interrupted, and got new sub-execute-IDs.
Solution
Add
HasIndexedOpts()andHasOptsForIndex(i)methods toNodeOptions, and wire them into the batch/loop skip logic as a second check alongside the existingGetResumeIndexes()check.Key Insight
The resume option chain is built inside-out by walking the interrupt event's
NodePath. Each nesting layer peels off one wrapping at runtime:WithOptsForNested(inner)GetOptsForNested()→ pass to inner RunnerWithResumeIndex(i, modifier)GetResumeIndexes()[i]→ apply modifierWithOptsForIndexed(i, inner)GetOptsForIndexed(i)→ pass to inner invokeThe fix specifically addresses the "intermediate composite" case — the layer that passes resume options through to a deeper sub-workflow. This works at arbitrary nesting depth.
Summary
HasIndexedOpts/HasOptsForIndexchecks to skip un-resumed items in the nested path问题
当 batch/loop 节点包含带有可中断节点的子工作流时,恢复某个被中断的项目会导致其他(未恢复的)项目被不必要地重新执行。恢复本身是成功的——功能结果正确——但未恢复项目的重新执行会重新生成其 sub-execute-ID,破坏执行历史。
示例:
batch[items: a, b] → sub_workflow → QA_node— 两个项目都中断。用户恢复项目a。项目b应该被跳过(还没有被恢复),但它却重新执行并重新中断,生成了新的 sub-execute-ID。项目b原始执行的历史记录由此成为孤儿记录。根因
恢复有两条选项传递路径:
toResumeIndexes):中断节点是 batch/loop 的直接子节点时使用optsForIndexed):中断节点在 batch/loop 内的子工作流中时使用batch/loop 的跳过逻辑只检查了路径 (1)。当恢复通过路径 (2) 传递时,未恢复的项目没有被识别为"应跳过"——它们重新执行、重新中断,并获得了新的 sub-execute-ID。
解决方案
在
NodeOptions上新增HasIndexedOpts()和HasOptsForIndex(i)方法,并在 batch/loop 的跳过逻辑中作为第二重检查。关键洞察
恢复选项链由内向外构建,遍历中断事件的
NodePath。运行时每层拆开一层包装:WithOptsForNested(inner)GetOptsForNested()→ 传入内部 RunnerWithResumeIndex(i, modifier)GetResumeIndexes()[i]→ 应用 modifierWithOptsForIndexed(i, inner)GetOptsForIndexed(i)→ 传入内部 invoke本修复专门处理"中间层复合节点"的情况——该层将恢复选项透传到更深的子工作流。此机制支持任意嵌套深度。
HasIndexedOpts/HasOptsForIndex检查以跳过未恢复项目