-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Description
Describe the bug
For the qn9090 (cortex M4, armv7) we are building S2RAM support.
We are testing with a hello-world main, and some shell commands which use pm_state_force to change the state to S2RAM in a handler on the system workq. We use this zephyr repo: https://github.com/Sendrato/zephyr/tree/main-branch-develop, which is based on zephyr 3.6
The assembly of the main-loop looks like:

So r4-r6 are loaded with the proper pointers before the loop starts looping infinitely.
The register values on a cold-boot are correct: all of r4-r6 point to the correct addresses:


After wakeup with S2RAM, the r5 value is changed. So somehow the stacking gets messed up.

With addr2line it turns out that r5 is now pointing to the instruction just after arch_pm_s2ram_resume. I've tried different setups for the code, but bo matter what changes and at which offset/address code is stored, r5 always points to the address after arch_pm_s2ram_resume upon warm boot.
$ /Users/hesselm/Development/toolchains/zephyr-sdk-0.16.3/arm-zephyr-eabi/bin/arm-zephyr-eabi-addr2line -e ./../build-dk6_qn9090/zephyr/zephyr.elf 0x5373
/Users/hesselm/Development/git/sendrato/awareable-fw-repo/zephyr/arch/arm/core/cortex_m/reset.S:120
We are not using CONFIG_PLATFORM_SPECIFIC_INIT, so the first call is to the assembly to disable the MPU
Reverting PR #68860 seems to fix this issue: r5 is correct after resuming s2ram. In this PR the value of lr gets pushed on the stack. I'm not certain which stack, but my guess is that the SP or MSP are not properly setup yet and the original push of all registers in arch_pm_s2ram_suspend gets overwritten. When I try this simple (but not ARM-acceptable) patch on PR #68860, the values in r4-r6 in the main loop are correct:
/*
* Check if reset occurred after suspending to RAM.
*/
- push {lr}
+ mov r5, lr
bl pm_s2ram_mark_check_and_clear
cmp r0, #0x1
- pop {lr}
+ mov lr, r5
beq resume
bx lr
So, my guess is that the pushing/popping of {r4-r12, lr} in the suspend and resume call assume a certain stack, which might be overridden by a warm boot. I think the fix might be in either:
- ensuring no stacking operation is done before resuming the S2RAM. But this is tricky as the proposed approach of
pm_s2ram_mark_check_and_clearis a generic c-function which might compile to some stacking operation - push/pop the registers
{r4-r12, lr}to a dedicated RAM-placeholder (just as the other registers incpu_context_t) instead of the stack.
I think the second approach is the cleanest, but I'm not entirely sure if the above deduction about the (incorrect) stacking is fully correct or makes sense. It also does not make sense to me that only r5 is affected and r4 and r6 are properly pushed/popped.
PR #76412 has the same issue (and has some other errors as well).
To Reproduce
Not sure if this is easily reproduced, if needed i can publish the full test-app and setup we use.
- zephry repo: https://github.com/Sendrato/zephyr/tree/main-branch-develop
- rework hello-world print statement such that the assembly uses r4-r6 as address
- use shell to put SoC into S2RAM.
- wakeup SoC and observe the value of r5 when the main loop is scheduled.
Expected behavior
r5 holds the value of lr, pointing to the next address after arch_pm_s2ram_resume
Impact
For S2RAM: system might respond unpredictable when there is code using r5 without ldr.
Logs and console output
See description
Environment (please complete the following information):
- Zephyr 3.6 with some patches: https://github.com/Sendrato/zephyr/tree/main-branch-develop
- Toolchain: zephyr-sdk-0.16.3
- SoC: qn9090 on the qn9090dk6 board (NXP). But any board supporting S2RAM might have this issue
Additional context
We have ported support for the qn9090 ourselves and tried to merge this into Zephyr, but NXP rejected it.
