Skip to content

Conversation

@ycsin
Copy link
Member

@ycsin ycsin commented Jun 24, 2025

Note

This is the recreation of #81688

  • Simplified the fatal handling code
  • Always print the content of callee-saved-registers
  • Updated the testcase to test more configurations

@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch 4 times, most recently from af0176d to 28a2b4a Compare June 25, 2025 06:54
@sonarqubecloud
Copy link

@ycsin ycsin marked this pull request as ready for review June 25, 2025 08:48
@github-actions github-actions bot added Release Notes To be mentioned in the release notes area: Architectures area: RISCV RISCV Architecture (32-bit & 64-bit) platform: nRF Nordic nRFx labels Jun 25, 2025
@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Aug 25, 2025
@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch from 60a9e1d to ae89ebd Compare October 21, 2025 19:36
@ycsin
Copy link
Member Author

ycsin commented Oct 21, 2025

ping @fkokosinski @npitre please take a look, thanks!


* RISCV

* :kconfig:option:`CONFIG_EXTRA_EXCEPTION_INFO` has been removed, the ``*csf`` pointer will be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should get deprecated instead of removed, so it will need a note in the release notes

Comment on lines 345 to 351
config EXTRA_EXCEPTION_INFO
bool "Collect extra exception info"
depends on EXCEPTION_DEBUG
help
This option enables the collection of extra information, such as
register state, when a fault occurs. This information can be useful
to collect for post-mortem analysis and debug of issues.
Copy link
Member

@cfriedt cfriedt Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a simple way to preserve functionality such that this is equivalent to CONFIG_EXCEPTION_DEBUG, but this should work

Suggested change
config EXTRA_EXCEPTION_INFO
bool "Collect extra exception info"
depends on EXCEPTION_DEBUG
help
This option enables the collection of extra information, such as
register state, when a fault occurs. This information can be useful
to collect for post-mortem analysis and debug of issues.
config EXTRA_EXCEPTION_INFO
bool "Collect extra exception info [DEPRECATED]"
select DEPRECATED
help
This option is deprecated and should be replaced with CONFIG_EXCEPTION_DEBUG.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed, thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly remove it if it has no function anymore? It's not really a stable api where we have to do the 2 release deprivation, or?

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor suggestions for Kconfig options, otherwise, LGTM

@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch from ae89ebd to 8ce502d Compare October 21, 2025 20:56
@zephyrbot zephyrbot requested a review from cfriedt October 21, 2025 20:58
* :c:func:`bt_ctlr_set_public_addr` is deprecated in favor of using
:c:struct:`bt_hci_cp_vs_write_bd_addr` for setting the public Bluetooth device address.

* :kconfig:option:`CONFIG_EXTRA_EXCEPTION_INFO` is deprecated. Use :kconfig:option:`CONFIG_EXCEPTION_DEBUG` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other archs also use EXTRA_EXCEPTION_INFO, where it is not deprecated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, edited the wording, thanks.

@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch from 8ce502d to f061e5a Compare October 22, 2025 17:55
@ycsin
Copy link
Member Author

ycsin commented Oct 23, 2025

ping @fkokosinski @kgugala @tgorochowik would really like to get this merged into 4.3, thanks

fkokosinski
fkokosinski previously approved these changes Oct 23, 2025
maass-hamburg
maass-hamburg previously approved these changes Oct 23, 2025
Copy link

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with this patch.

STORE_CSF_IN_ESF() retrieves the original s0 value then calls
z_riscv_fault then branches to no_reschedule. There you'll find
code like this:

        /* decrement _current->arch.exception_depth */
        lr t0, ___cpu_t_current_OFFSET(s0)
        lb t1, _thread_offset_to_exception_depth(t0)
        add t1, t1, -1
        sb t1, _thread_offset_to_exception_depth(t0)

But nowhere did you restore s0 with the current CPU pointer.

When CONFIG_EXCEPTION_DEBUG is set you should branch to
might_have_rescheduled instead.

Also the code from label do_fault: ends with tail z_riscv_fatal_error
which implies z_riscv_fatal_error could return and ra already contains the
return address. I'd use a plain branch instruction here to be clearer.

Copy link

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT my previous comment...

Even better than jumping to might_have_rescheduled which can be
unavailable if CONFIG_MULTITHREADING is not set, you should simply
pull STORE_CALLEE_SAVED directly into STORE_CSF_IN_ESF as it is
the only user anyway. Then for retrieving the original s0 value you
simply use t0 rather than s0. Problem solved.

Another issue with this patch is the lack of symetry. You have a macro
that implicitly modifies the stack and to undo the stack you have to do it
explicitly. So you should either leave the stack modification outside the
macro, or create another macro to complement STORE_CSF_IN_ESF() to
abstract the stack cleanup.

@ycsin ycsin dismissed stale reviews from maass-hamburg and fkokosinski via 21f73f7 October 23, 2025 17:24
@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch from f061e5a to 21f73f7 Compare October 23, 2025 17:24
@zephyrbot zephyrbot requested a review from fkokosinski October 23, 2025 17:26
@ycsin
Copy link
Member Author

ycsin commented Oct 23, 2025

@npitre extremely helpful review as always, really appreciate them.
I've updated the patch according to your comments, please take another look, thanks!

@ycsin ycsin force-pushed the pr/riscv-fatal-streamline-2 branch from 21f73f7 to 98c3bf5 Compare October 23, 2025 17:31
@sonarqubecloud
Copy link

Please retry analysis of this Pull-Request directly on SonarQube Cloud

maass-hamburg
maass-hamburg previously approved these changes Oct 23, 2025
Copy link

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ycsin:

If I may suggest an improvement, I'd do this on top:

diff --git a/arch/riscv/core/isr.S b/arch/riscv/core/isr.S
index b6f750a332e..ce1059b984c 100644
--- a/arch/riscv/core/isr.S
+++ b/arch/riscv/core/isr.S
@@ -42,9 +42,15 @@
 	RV_E(	op ra, __struct_arch_esf_ra_OFFSET(sp)	)
 
 #ifdef CONFIG_EXCEPTION_DEBUG
-/* Convenience macro for storing callee saved register [s0 - s11] states. */
-#define STORE_CALLEE_SAVED() \
-	RV_E(	sr s0, ___callee_saved_t_s0_OFFSET(sp)		);\
+/*
+ * Convenience macro for storing callee saved register [s0 - s11] states.
+ * Note: s0 is callee-saved and the exception entry code already saved it
+ * in struct arch_esf so it could be used as the current CPU pointer.
+ * Also stores address of csf to the esf. Clobbers t0.
+ */
+#define STORE_CALLEE_SAVED(esf) \
+		lr t0, __struct_arch_esf_s0_OFFSET(esf);\
+	RV_E(	sr t0, ___callee_saved_t_s0_OFFSET(sp)		);\
 	RV_E(	sr s1, ___callee_saved_t_s1_OFFSET(sp)		);\
 	RV_I(	sr s2, ___callee_saved_t_s2_OFFSET(sp)		);\
 	RV_I(	sr s3, ___callee_saved_t_s3_OFFSET(sp)		);\
@@ -55,7 +61,8 @@
 	RV_I(	sr s8, ___callee_saved_t_s8_OFFSET(sp)		);\
 	RV_I(	sr s9, ___callee_saved_t_s9_OFFSET(sp)		);\
 	RV_I(	sr s10, ___callee_saved_t_s10_OFFSET(sp)	);\
-	RV_I(	sr s11, ___callee_saved_t_s11_OFFSET(sp)	)
+	RV_I(	sr s11, ___callee_saved_t_s11_OFFSET(sp)	);\
+		sr sp __struct_arch_esf_csf_OFFSET(esf)
 #endif /* CONFIG_EXCEPTION_DEBUG */
 
 	.macro get_current_cpu dst
@@ -386,19 +393,12 @@ no_fp:	/* increment _current->arch.exception_depth */
 	 */
 	mv a0, sp
 #ifdef CONFIG_EXCEPTION_DEBUG
-	/* s0 is required in no_reschedule, backup here */
-	mv t0, s0
-	/* Get the s0 value from the esf */
-	lr s0, __struct_arch_esf_s0_OFFSET(sp)
 	/* Make space for callee-saved registers */
 	addi sp, sp, -__callee_saved_t_SIZEOF
-	STORE_CALLEE_SAVED()
-	/* Store address to csf to the esf */
-	sr sp __struct_arch_esf_csf_OFFSET(a0)
-	/* Restore original s0 */
-	mv s0, t0
+	STORE_CALLEE_SAVED(a0)
 
 	call z_riscv_fault
+
 	/* Restore SP if z_riscv_fault returns before jumping to no_reschedule */
 	addi sp, sp, __callee_saved_t_SIZEOF
 
@@ -476,11 +476,8 @@ do_fault:
 1:	mv a1, sp
 
 #ifdef CONFIG_EXCEPTION_DEBUG
-	/* z_riscv_fatal_error isn't supposed to return, so just do whatever is necessary */
-	lr s0, __struct_arch_esf_s0_OFFSET(sp)
 	addi sp, sp, -__callee_saved_t_SIZEOF
-	STORE_CALLEE_SAVED()
-	sr sp __struct_arch_esf_csf_OFFSET(a1)
+	STORE_CALLEE_SAVED(a1)
 #endif /* CONFIG_EXCEPTION_DEBUG */
 	tail z_riscv_fatal_error
 

ycsin added 2 commits October 24, 2025 11:06
`CONFIG_EXTRA_EXCEPTION_INFO` that was added in zephyrproject-rtos#78065 doesn't
seem necessary, as we were already storing and printing the
callee-saved-registers before that. All `CONFIG_EXTRA_EXCEPTION_INFO`
does in RISCV is to add an additional `_callee_saved_t *csf` in the
`struct arch_esf`, which overhead is negligible to what's being enabled
by `CONFIG_EXCEPTION_DEBUG`.

Let's remove `CONFIG_EXTRA_EXCEPTION_INFO`, and have that extra
`_callee_saved_t *csf` in the `struct arch_esf` as long as
`CONFIG_EXCEPTION_DEBUG` is enabled.

TL;DR: it doesn't make sense to not enable `CONFIG_EXTRA_EXCEPTION_INFO`
when `CONFIG_EXCEPTION_DEBUG` is enabled, so let's merge them.

Then, since `*csf` is always available in the `struct arch_esf` when
`CONFIG_EXCEPTION_DEBUG=y`, we can simply rely on that pointer in
`z_riscv_fatal_error()` instead of an additional argument in
`z_riscv_fatal_error_csf()`, rendering the latter redundant and thus
can be removed.

Additionally, save the callee-saved registers before jumping to
to `z_riscv_fault()`, so that callee-saved-registers are printed on
generic CPU exception as well.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
- Updated the testcase to test `z_riscv_fault()` error handling path.
- The main file is now fully assembly so that we can use precompiler
  guards more easily to test with/without frame pointer enabled.

Signed-off-by: Yong Cong Sin <[email protected]>
Signed-off-by: Yong Cong Sin <[email protected]>
@ycsin
Copy link
Member Author

ycsin commented Oct 24, 2025

If I may suggest an improvement, I'd do this on top:

Neat, applied, thanks!

@ycsin ycsin requested a review from npitre October 24, 2025 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Architectures area: Boards/SoCs area: RISCV RISCV Architecture (32-bit & 64-bit) area: Tests Issues related to a particular existing or missing test platform: nRF Nordic nRFx Release Notes To be mentioned in the release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants