Skip to content

Conversation

@ioannisg
Copy link
Member

@ioannisg ioannisg commented Jan 7, 2021

Two fixes here

  • the first fix ensures the MMFAR address is not read when its content is not valid
  • the second fix allows the fault handler to correctly identify some of the MemManage faults as stack overflows, instead of regular CPU exceptions, which facilitates debugging these errors. [without this patch, these stack overflows are still caught but are identified as generic CPU exceptions]

Issue identified when working on #30028

@github-actions github-actions bot added the area: ARM ARM (32-bit) Architecture label Jan 7, 2021
@ioannisg
Copy link
Member Author

ioannisg commented Jan 7, 2021

@chrisc11 FYI, please take a look, if possible.

@ioannisg ioannisg added the bug The issue is a bug, or the PR is fixing a bug label Jan 7, 2021
@ioannisg ioannisg self-assigned this Jan 7, 2021
@ioannisg ioannisg requested review from anangl and carlescufi January 7, 2021 10:26
@ioannisg ioannisg added this to the v2.5.0 milestone Jan 7, 2021
@ioannisg ioannisg force-pushed the arch_arm_cortex_m_fault_handle_stack_overflow_enhancement branch from cf533a9 to c770157 Compare January 12, 2021 17:38
@ioannisg
Copy link
Member Author

@galak @carlescufi pls take a quick look here

Copy link
Contributor

@chrisc11 chrisc11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @ioannisg!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could fold this into 236 and update mmfar with SCB->MMFAR there directly

Copy link
Member Author

@ioannisg ioannisg Jan 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, for the reasons explained above, in the inline comment, I'd rather keep the design as is, because theoretically the MemManage handler could be preempted rigth after reading the MMARVALID flag, and there would be cases where the flag would not be valid any more.

By design in Zephyr the fault handlers have highest configurable priority, but i wanted to have a design that would be robust against changes :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrm, yeah I guess one could have the MemManage exception at a lower priority than another ISR, get interrupted here, and fault from that ISR. That would be a tricky crash to investigate! :)

When the MMARVALID bit is not set, do not read the MMFAR
register to get the fault address in a MemManage fault.
This change prevents the fault handler to erroneously
assume MMFAR contains a valid address.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
@ioannisg ioannisg force-pushed the arch_arm_cortex_m_fault_handle_stack_overflow_enhancement branch from c770157 to 7aca8ef Compare January 14, 2021 08:10
In rare cases when a thread may overflow its stack, the
core will not report a Stacking Error. This is the case
when a large stack array is created, making the PSP cross
beyond the stack guard; in this case a MemManage fault
won't cause a stacking error (but only a Data Access
Violation error). We fix the fault handling logic so
such errors are reported as stack overflows and not as
generic CPU exceptions.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
@ioannisg ioannisg force-pushed the arch_arm_cortex_m_fault_handle_stack_overflow_enhancement branch from 7aca8ef to db4372d Compare January 14, 2021 08:22
@carlescufi carlescufi merged commit c6c1472 into zephyrproject-rtos:master Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants