-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Fix multicore_lockout features so that the victim core cannot become stuck in an infinite loop if the lockout attempt times out #2467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The lockout state is controlled by a shared variable. The FIFO is used to begin a lockout and acknowledge it (i.e. multicore_lockout_handshake works as before) but the end of the lockout is now signalled by updating the shared variable. This ensures that timeouts are recognised reliably by the victim core. __wfe and __sev are used to signal updates to the shared variable in order to avoid polling.
|
Does this PR mean that any of the Doxygen at https://github.com/raspberrypi/pico-sdk/blob/develop/src/rp2_common/pico_multicore/include/pico/multicore.h#L418 also needs updating to match, or is that all still "correct"? |
You're right, this did need updating. And, I didn't realise this until now, but the semantics of I thought about whether it was a good idea to push ahead with this behaviour change. I think it is a good idea, because it's simpler, and I don't see the value in knowing that the end of the lockout has been acknowledged by the victim core. The new design avoids the possibility for |
|
|
||
| static mutex_t lockout_mutex; | ||
| static bool lockout_in_progress; | ||
| static io_rw_32 lockout_request_id = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small style point, we would probably use volatile uint32_t explicitly here vs io_rw_32 which is really intended for memory mapped IO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, replaced.
| return rc; | ||
| } | ||
|
|
||
| static uint32_t update_lockout_request_id() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a (void) i'm pretty sure without this it makes at least one of the many GCC or Clang compiler versions unhappy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great except a few nits
Fixes #2454
This bug can be triggered if the lockout request times out. If the victim CPU core has interrupts disabled for a long time, it will not respond quickly enough to the lockout request, which times out. However, the request is still pending in the FIFO, and when it is eventually handled by the victim CPU core, that core enters an infinite loop, waiting for a LOCKOUT_MAGIC_END message which never arrives.
I considered the solution of always sending a LOCKOUT_MAGIC_END message even on timeout, but this wouldn't work if the FIFO was already full. I could not use a blocking wait, as this would not respect the timeout, and as far as I can tell, there is no hardware support for clearing the FIFO without also resetting the CPU.
In this PR, the lockout state is controlled by a shared variable. The FIFO is used to begin a lockout and acknowledge it as before. But the end of the lockout is now signalled by updating the shared variable. This ensures that the end of the lockout request is recognised reliably by the victim CPU core, regardless of whether the end was caused by a timeout or whether the lockout completed normally. __wfe and __sev are used to signal updates to the shared variable in order to avoid polling.
The semantics of the
multicore_lockout_end_...functions change: they will no longer wait for the lockout to end. This is described further in the updated doxygen and in the comments below.