Skip to content

Conversation

@UncleGrumpy
Copy link
Collaborator

These changes fix several open supervisor issues, as well as a few small bugs discovered in the supervisor and test_supervisor.erl.

  • Implements missing one_for_all restart strategy that is documented in the module.
  • Supervisors now obeys intensity and period options.
  • Children who fail to restart are retried until maximum restart intensity limit is reached
  • Fixes edge case bug when terminating as well as several typos.
  • Fixes some bugs in the test suite that left un-received messages in the test environment that would be received by additional tests added to the end of the lists in test_supervisor:/test/0.

Closes #1855
Closes #1915
Closed #1957

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch from e0f3afd to 7b2435d Compare November 4, 2025 18:02
Copy link
Collaborator

@pguyot pguyot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work!

I looked at OTP's implementation and we're doing things differently, which may or may not be a good thing :)

OTP's terminate_child/2 is synchronous and I like the asynchronous implementation we currently have. Yet, it introduces some complexity.

I wonder if we should move further and make this entirely state based, i.e. avoid send_after/3 entirely and rely on timeout feature of gen_server instead for the general case, or just plain receive of exit messages for the terminate case. It may make this more easy to test or more pure. Also be careful of processing of messages with send_after because we don't know if the child is still there.

@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch 2 times, most recently from bc92215 to bcce436 Compare November 13, 2025 20:30
Fixes a typo in the init funtion error return where the atom `mod` was returned rather than the
modules name (`Mod`).

Fixes a typo in child record `type`, adds missing parenthesis on the `child_type/0` type.

Signed-off-by: Winford <[email protected]>
Renames restart_child/3 to handle_child_exit/3 for clarity. This is an intenal callback handler,
and not related to the exported restart_child/2.

Signed-off-by: Winford <[email protected]>
Moves the internal private funtions `handle_child_exit/3` and `should_restart/2` to the same
section as the rest of the internal private funtions.

Signed-off-by: Winford <[email protected]>
Fixes some tests to catch the messages from child ping_pong_servers so they will not be received by
later tests. Changes the supervisor started in test_count_children to use a generic supervisor
without a child, rather than using the modules start_link which always starts a child
ping_pong_server that sends messages back to the test environment. Fixes the test_ping_ping final
`recieve` to match on a new `Pid4` (which would indicate a restart) rather than `Pid3` that the
monitor just caught the 'DOWN' message for.

Signed-off-by: Winford <[email protected]>
@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch 2 times, most recently from f0b6954 to 0017e5a Compare November 15, 2025 00:24
Adds support for handling restarts for the `one_for_all` strategy that was documented, but lacked
implementation. Makes necessary changes to ensure children are always restarted in the same order
they were originally started, and shutdown in reverse order with last child first, conforming to
OTP behavior.

Closes atomvm#1855

Signed-off-by: Winford <[email protected]>
@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch from 0017e5a to 3c8849a Compare November 17, 2025 06:09
-type sup_ref() ::
(Name :: atom())
| {Name :: atom(), Node :: node()}
| {global, Name :: term()}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we don't support these supervisor references (global, via) yet. We shouldn't include them in the types, so dialyzer would catch uses.

Copy link
Collaborator Author

@UncleGrumpy UncleGrumpy Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

State = init_state(StartSpec, #state{restart_strategy = Strategy}),
{ok, {#{} = SupSpec, StartSpec}} ->
Strategy = maps:get(strategy, SupSpec, one_for_one),
Intensity = maps:get(intensity, SupSpec, 3),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have 1 as the default above and 3 here.
Should we have a macro for the default to avoid discrepacies?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I added macros for the default intensity and period values.

@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch from 3c8849a to 11474b5 Compare November 17, 2025 07:48
Adds support for honoring intensity and period supervisor options, allowing the prevention of
endless crash and restarts by misbehaving children.

Adds a test to ensure the settings work as expected on AtomVM and match OTP.

Closes atomvm#1915

Signed-off-by: Winford <[email protected]>
Fixes a function clause exception that would crash the supervisor if a child fails to restart by
matching the same behavior as OTP, and continue to try restarting the child until success, or
maximum `intensity` is reached within the allowed `period`.

Closed atomvm#1957

Signed-off-by: Winford <[email protected]>
- Updates some types to match OTP, and adds some others, including exports types for
`startchild_ret/0` and `startlink_ret/0`.
- Adds a module doc section listing differences with OTP supervisor.
- Adds specs to all public functions, and marks callbacks as hidden functions so they are not
included in published user API docs.
- Adds todos where child and supervisor event reports should be logged

Signed-off-by: Winford <[email protected]>
@UncleGrumpy UncleGrumpy force-pushed the supervisor-one_for_all-main branch from 11474b5 to 51cdbc9 Compare November 17, 2025 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants