-
Notifications
You must be signed in to change notification settings - Fork 69
Description
Describe the bug
The workflow action, await-agent-restart, seems to behave different depending on whether it is called from a workflow or from a sub-operation (i.e. sub-workflow).
The difference in behaviour is only observable when the await-agent-restart action doesn't use an intermediate state, and is being called from another workflow.
For instance, below shows an example workflow which is called from another workflow. When the [restarting] uses the on_success = "successful transition, then the workflow never completes, however when using an intermediate state, on_success = "restarted", the workflow successfully completes.
Below show an example of the two workflows, where restart-tedge-agent-wrapper workflow will call the restart-tedge-agent-internal workflow.
file: restart-tedge-agent-internal.toml
operation = "restart-tedge-agent-internal"
[init]
action = "proceed"
on_success = "restart"
[restart]
background_script = "sudo systemctl restart tedge-agent"
on_exec = "restarting"
[restarting]
action = "await-agent-restart"
# on_success = "restarted" # <=== Result: PASS
on_success = "successful" # <=== Result: FAIL
timeout_second = 30
on_timeout = "failed"
[restarted]
action = "proceed"
on_success = "successful"
[successful]
action = "cleanup"
[failed]
action = "cleanup"file: restart-tedge-agent-wrapper.toml
operation = "restart-tedge-agent-wrapper"
[init]
action = "proceed"
on_success = "restart"
[restart]
operation = "restart-tedge-agent-internal"
on_exec = "restarting"
[restarting]
action = "await-operation-completion"
on_success = "successful"
[successful]
action = "cleanup"
[failed]
action = "cleanup"
Symptoms
- The
restart-tedge-agent-internalworkflow successfully completes when called directly (e.g. not from another workflow) - The
restart-tedge-agent-wrapperdoes not finish/hangs if therestart-tedge-agent-internalworkflow useson_success = "successful"in the[restarting]state. - The
restart-tedge-agent-wrappercompletes if therestart-tedge-agent-internalworkflow useson_success = "restarted"in the[restarting]state.
The following shows a snippet of the workflow when th restart hangs:
----------------------[ restart-tedge-agent-wrapper @ restarting | time=2025-11-04T02:18:42.68615348Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"restarting"}
Action: await sub-operation completion
=> restart-tedge-agent-internal sub-operation is still runningTo Reproduce
Reproducing the bug is slightly complicated, so a system test was created to demonstrate the bug.
Expected behavior
The await-agent-restart action should not require an intermediate state and should behave the same when either being called directly or from another workflow.
Screenshots
Environment (please complete the following information):
| Property | Value |
|---|---|
| OS [incl. version] | Debian GNU/Linux 12 (bookworm) |
| Hardware [incl. revision] | unknown |
| System-Architecture | Linux 5dc411da8849 6.8.0-64-generic #67-Ubuntu SMP PREEMPT_DYNAMIC Sun Jun 15 20:23:40 UTC 2025 aarch64 GNU/Linux |
| thin-edge.io version | tedge 1.6.2~275+g7689e03 |
Additional context
Workflow log
==> /var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log <==
==================================================================
Triggered restart-tedge-agent-wrapper workflow
==================================================================
topic: te/device/main///cmd/restart-tedge-agent-wrapper/robot-1
operation: restart-tedge-agent-wrapper
cmd_id: robot-1
time: 2025-11-04T02:17:42.470574114Z
==================================================================
----------------------[ restart-tedge-agent-wrapper @ init | time=2025-11-04T02:17:42.471326527Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"init"}
Action: move to restart state
=> moving to restart-tedge-agent-wrapper @ restart
----------------------[ restart-tedge-agent-wrapper @ restart | time=2025-11-04T02:17:42.476852168Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"restart"}
Action: execute restart-tedge-agent-internal as sub-operation
=> moving to restart-tedge-agent-wrapper @ restarting
----------------------[ restart-tedge-agent-wrapper @ restarting | time=2025-11-04T02:17:42.48417155Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"restarting"}
Action: await sub-operation completion
----------------------[ restart-tedge-agent-wrapper > restart-tedge-agent-internal @ init | time=2025-11-04T02:17:42.5072824Z ]----------------------
State: {"@version":"2225b1c86aeb227c52a25413683692fb3a09fc6d85ce8c59dc17e1333be08533","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"init"}
Action: move to restart state
=> moving to restart-tedge-agent-wrapper > restart-tedge-agent-internal @ restart
----------------------[ restart-tedge-agent-wrapper > restart-tedge-agent-internal @ restart | time=2025-11-04T02:17:42.511401381Z ]----------------------
State: {"@version":"2225b1c86aeb227c52a25413683692fb3a09fc6d85ce8c59dc17e1333be08533","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"restart"}
Action: sudo systemctl restart tedge-agent
=> moving to restart-tedge-agent-wrapper > restart-tedge-agent-internal @ restarting
Killed by signal: 15
stderr (EMPTY)
stdout (EMPTY)
----------------------[ restart-tedge-agent-wrapper @ restarting | time=2025-11-04T02:18:42.673433099Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","resumed_at":"1762222662.0","status":"restarting"}
Action: await sub-operation completion
=> restart-tedge-agent-internal sub-operation is still running
----------------------[ restart-tedge-agent-wrapper > restart-tedge-agent-internal @ successful | time=2025-11-04T02:18:42.675430302Z ]----------------------
State: {"@version":"2225b1c86aeb227c52a25413683692fb3a09fc6d85ce8c59dc17e1333be08533","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","resumed_at":"1762222662.0","status":"successful"}
Action: wait for the requester to finalize the command
Resuming invoking command te/device/main///cmd/restart-tedge-agent-wrapper/robot-1
----------------------[ restart-tedge-agent-wrapper @ restarting | time=2025-11-04T02:18:42.68615348Z ]----------------------
State: {"@version":"b7e6501165817cde457d08806c7702994c15b65edafb6e574b9224d829ee8e6b","logPath":"/var/log/tedge/agent/workflow-restart-tedge-agent-wrapper-robot-1.log","status":"restarting"}
Action: await sub-operation completion
=> restart-tedge-agent-internal sub-operation is still running