-
Notifications
You must be signed in to change notification settings - Fork 28
Description
Background
Currently, when an exception occurs in Linmo, the system provides minimal debugging information before halting. The do_trap() function in arch/riscv/hal.c only prints basic exception details:
[EXCEPTION] code=2 (Illegal instruction), epc=80001234, cause=00000002
This makes kernel development difficult because:
- Limited context: No information about which task caused the exception
- Missing CPU state: Critical CSR values (mtval, mstatus) are not displayed
- No call stack: Cannot see the function call chain leading to the exception
- Incomplete exception messages: The
exc_msg[]array has gaps with "Reserved" entries
For kernel developers, exceptions are the most common issue encountered during development. Without detailed diagnostic information, debugging becomes a time-consuming process of trial and error.
Proposed Solution
Enhance the exception reporting mechanism to provide comprehensive debugging information that helps kernel developers quickly identify and fix issues. Focus on automatically captured runtime information (stack traces) while keeping kernel image size minimal by moving detailed explanations to documentation.
Goals
- Complete exception message coverage: Provide descriptive messages for all exception types in the array
- Rich context information: Display CPU state, task information, and system state at the time of exception
- Stack trace: Show the function call chain leading to the exception using frame pointers
- Minimal overhead: Only execute when exceptions occur (no runtime cost during normal operation)
- Small image size: Avoid embedding large amounts of explanatory text in kernel code
Detailed Requirements
1. Complete Exception Message Array
The current exc_msg[] array in do_trap() covers indices 0-15, which includes all standard RISC-V exceptions that Linmo might encounter. However, some entries are marked as "Reserved" without meaningful descriptions.
Update the array to provide:
- Clear descriptions for all defined exception types
- Meaningful notes for reserved entries
- Consistency with RISC-V specification terminology
Example:
static const char *exc_msg[] = {
[0] = "Instruction address misaligned",
[1] = "Instruction access fault",
[2] = "Illegal instruction",
/* ... */
[10] = "Reserved (custom use)",
[11] = "Environment call from M-mode",
/* ... */
[14] = "Reserved (custom use)",
[15] = "Store/AMO page fault",
};2. Enhanced Exception Output
When an exception occurs, print comprehensive diagnostic information including:
Required Information:
- Exception name and array index
- Current task ID and entry function address
- Program counter (mepc) - where the exception occurred
- Fault address (mtval) - the address that caused the fault (for memory exceptions)
- Machine status (mstatus) - processor state at exception time
- Stack pointer - current SP value
Example Output:
====================================
KERNEL EXCEPTION
====================================
Exception: Illegal instruction (exc_msg[2])
Task ID: 2
Task Entry: 0x80002000
Program Counter (mepc): 0x80001234
Fault Address (mtval): 0x00000000
Status (mstatus): 0x00001880
Stack Pointer (sp): 0x80010000
3. Stack Trace
Print the call stack to show the function call chain leading to the exception. This is significantly more useful than generic debugging hints as it shows the actual execution path.
Implementation approach:
- Walk the stack using frame pointer (s0/fp register)
- Print return addresses from each stack frame
- Follow RISC-V ABI stack frame structure
- Requires compilation with
-fno-omit-frame-pointer
Stack Frame Structure (RISC-V ABI):
Higher addresses
+------------------+
| return address | <- fp - 8
+------------------+
| previous fp | <- fp - 16
+------------------+
| local variables |
+------------------+
| ... |
Lower addresses
Example Output:
Stack Trace:
#0: 0x80001234 (exception location)
#1: 0x80001100
#2: 0x80000f00
#3: 0x80000800
Implementation Notes:
- Stack trace depth should be limited (e.g., maximum 8-10 frames) to prevent issues with corrupted stacks
- Invalid frame pointers should be detected and handled gracefully
- Stack bounds validation prevents reading from invalid memory regions
- Common exception causes and debugging strategies will be documented in
Documentation/exception-debugging-guide.md
4. System State Snapshot at Exception
When an exception occurs, include a snapshot of the system state:
Task State:
- List all tasks with their current state (READY/RUNNING/BLOCKED)
- Highlight the task that triggered the exception
Memory State:
- Show heap usage (allocated/free) when available
- Indicate if memory pressure might be a factor
Example Output:
=== System State ===
Tasks:
0: READY (stack: 512/2048)
1: READY (stack: 256/1024)
2: RUNNING (stack: 128/512) <- Exception here
3: BLOCKED (stack: 64/512)
Memory:
Heap: 3456/8192 bytes (42% used)
5. Implementation Location and Structure
To keep do_trap() clean and maintainable, create separate helper functions:
Recommended approach:
- Create
print_exception_context()for CPU state display - Create
print_stack_trace()for stack unwinding - Create
print_system_state()for system snapshot - All functions should be
staticand placed inhal.c - Keep
do_trap()focused on dispatching
Implementation Suggestions
Helper Function 1: Print Exception Context
/* Print detailed exception context including CPU state and task info.
* This provides the essential diagnostic information needed to identify
* where and in which task the exception occurred.
*/
static void print_exception_context(uint32_t code, uint32_t epc)
{
uint32_t mtval = read_csr(mtval);
uint32_t mstatus = read_csr(mstatus);
uint32_t sp;
asm volatile("mv %0, sp" : "=r"(sp));
printf("\n====================================\n");
printf("KERNEL EXCEPTION\n");
printf("====================================\n");
/* Look up exception name from exc_msg array */
const char *exc_name = (code < ARRAY_SIZE(exc_msg) && exc_msg[code])
? exc_msg[code]
: "Unknown exception";
printf("Exception: %s (exc_msg[%u])\n", exc_name, code);
/* Print current task information if kernel is initialized */
if (kcb && kcb->task_current && kcb->task_current->data) {
tcb_t *task = (tcb_t *)kcb->task_current->data;
printf("Task ID: %u\n", task->id);
printf("Task Entry: %p\n", task->entry);
}
/* Print CPU state registers */
printf("Program Counter (mepc): 0x%08x\n", epc);
printf("Fault Address (mtval): 0x%08x\n", mtval);
printf("Status (mstatus): 0x%08x\n", mstatus);
printf("Stack Pointer (sp): 0x%08x\n", sp);
printf("\n");
}Helper Function 2: Print Stack Trace
/* Print stack trace by walking frame pointers.
* This shows the function call chain that led to the exception.
* Requires compilation with -fno-omit-frame-pointer.
*/
static void print_stack_trace(uint32_t epc)
{
printf("Stack Trace:\n");
/* Print exception location as frame 0 */
printf(" #0: 0x%08x (exception location)\n", epc);
/* Get current frame pointer */
uint32_t fp;
asm volatile("mv %0, s0" : "=r"(fp));
/* Walk the stack frames */
int frame_num = 1;
const int MAX_FRAMES = 10; /* Limit depth to prevent issues */
while (frame_num < MAX_FRAMES && fp != 0) {
/* Validate frame pointer is reasonable:
* - Must be within valid stack memory region (architecture-dependent)
* - Must be properly aligned (4-byte alignment for 32-bit systems)
*/
if (!is_valid_stack_pointer(fp)) {
printf(" (invalid frame pointer: 0x%08x)\n", fp);
break;
}
/* RISC-V ABI: return address is at fp-8, previous fp at fp-16
* These offsets assume 32-bit pointers and standard stack layout.
*/
uint32_t *frame = (uint32_t *)fp;
uint32_t return_addr;
uint32_t prev_fp;
/* Safe memory access: verify addresses are readable before dereferencing.
* In a real implementation, this would check MMU permissions or
* use a fault handler to detect invalid memory access.
*/
if (!read_stack_frame(frame, &return_addr, &prev_fp)) {
printf(" (unable to read stack frame at 0x%08x)\n", fp);
break;
}
/* Check for valid return address */
if (!is_valid_instruction_pointer(return_addr)) {
printf(" (invalid return address: 0x%08x)\n", return_addr);
break;
}
printf(" #%d: 0x%08x\n", frame_num, return_addr);
/* Move to previous frame */
fp = prev_fp;
frame_num++;
}
printf("\n");
}
/* Helper: Validate if address is a reasonable stack pointer.
* This is architecture and configuration-dependent.
* Linmo should define appropriate bounds based on its memory layout.
*/
static inline bool is_valid_stack_pointer(uint32_t addr)
{
/* Define valid stack range based on architecture configuration */
#define STACK_BASE 0x80000000 /* Architecture-specific base */
#define STACK_SIZE 0x10000000 /* Architecture-specific size */
/* Check alignment (must be 4-byte aligned) */
if ((addr & 0x3) != 0)
return false;
/* Check bounds */
if (addr < STACK_BASE || addr >= (STACK_BASE + STACK_SIZE))
return false;
return true;
}
/* Helper: Safely read return address and previous frame pointer from stack.
* This function encapsulates the memory access and can be extended
* to include MMU checks or fault handling as needed.
*/
static inline bool read_stack_frame(uint32_t *frame, uint32_t *ret_addr, uint32_t *prev_fp)
{
/* In production, this should include:
* - MMU permission checks
* - Fault handler integration
* - Bounds validation
* For now, we perform basic access with validation.
*/
*ret_addr = *(frame - 2); /* fp - 8: return address */
*prev_fp = *(frame - 4); /* fp - 16: previous frame pointer */
return true; /* TODO: Add proper error handling */
}
/* Helper: Validate if address looks like a valid instruction pointer.
* This is a basic check; more sophisticated validation could include
* checking against code section boundaries.
*/
static inline bool is_valid_instruction_pointer(uint32_t addr)
{
/* Must be in executable memory region (typically 0x80000000+) */
if (addr < 0x80000000)
return false;
return true;
}Helper Function 3: Print System State
/* Print system state snapshot including all tasks and memory usage.
* This provides context about what the system was doing when exception occurred.
*/
static void print_system_state(void)
{
printf("=== System State ===\n");
printf("Tasks:\n");
/* Iterate through all tasks */
if (kcb && kcb->tasks) {
list_node_t *node = kcb->tasks->head->next;
while (node != kcb->tasks->tail) {
tcb_t *task = (tcb_t *)node->data;
if (task) {
/* Determine state string */
const char *state_str;
switch (task->state) {
case TASK_READY: state_str = "READY"; break;
case TASK_RUNNING: state_str = "RUNNING"; break;
case TASK_BLOCKED: state_str = "BLOCKED"; break;
default: state_str = "UNKNOWN"; break;
}
/* Print task info */
printf(" %u: %-8s", task->id, state_str);
/* Add marker for exception task */
if (kcb->task_current && kcb->task_current->data == task) {
printf(" <- Exception here");
}
printf("\n");
}
node = node->next;
}
}
/* Print memory information if available */
printf("\nMemory:\n");
/* Heap usage - call appropriate allocator function based on implementation */
if (get_heap_stats) {
size_t used, total;
if (get_heap_stats(&used, &total) == 0) {
printf(" Heap: %zu/%zu bytes (%d%% used)\n", used, total,
(int)(used * 100 / total));
}
} else {
printf(" Heap: (allocator not available)\n");
}
printf("\n");
}Modified do_trap() Function
void do_trap(uint32_t cause, uint32_t epc)
{
/* ... existing exc_msg array definition ... */
if (MCAUSE_IS_INTERRUPT(cause)) {
/* Interrupt handling remains unchanged */
uint32_t int_code = MCAUSE_GET_CODE(cause);
if (int_code == MCAUSE_MTI) {
mtimecmp_w(mtimecmp_r() + (F_CPU / F_TIMER));
dispatcher();
} else {
printf("[UNHANDLED INTERRUPT] code=%u, cause=%08x, epc=%08x\n",
int_code, cause, epc);
hal_panic();
}
} else {
/* Enhanced exception handling */
uint32_t code = MCAUSE_GET_CODE(cause);
/* Print detailed context */
print_exception_context(code, epc);
/* Print stack trace */
print_stack_trace(epc);
/* Print system state */
print_system_state();
printf("====================================\n");
hal_panic();
}
}Key Implementation Points:
- Keep interrupt path unchanged - Only modify exception handling
- Helper functions are static - No need to expose in header files
- Use existing macros -
read_csr(),ARRAY_SIZE(),MCAUSE_GET_CODE() - Access kernel state safely - Check
kcbpointers before dereferencing - Maintain coding style - Follow Linmo's existing conventions for consistency
- Stack trace requires frame pointer - Add
-fno-omit-frame-pointerto compilation flags - Use configurable bounds - Define stack memory bounds using macros for portability
Technical Considerations
Memory Overhead:
- Exception message strings (minimal, already exists)
- Stack trace code (~300-400 bytes)
- System state dump code (~300-400 bytes)
- Helper validation functions (~200 bytes)
- No dynamic memory allocation required
- No runtime data structures needed
Performance Impact:
- Zero overhead during normal operation
- Only executes when exception occurs (already fatal)
- Stack walking is fast (simple pointer following)
- Acceptable latency increase in exception path
Compilation Requirements:
- Must compile with
-fno-omit-frame-pointerfor stack trace functionality - This adds minimal overhead to function prologues/epilogues
- Trade-off is worth it for debugging capabilities
Compatibility:
- No API changes - purely internal enhancement
- No impact on existing code or applications
- Backward compatible with current exception handling
- Stack trace code should gracefully handle missing frame pointers
Testing Strategy
Create test cases to verify enhanced output:
- Deliberately trigger illegal instruction exception
- Test misaligned memory access
- Test access to invalid addresses
- Verify all exception types produce correct output
- Verify task information is correctly displayed
- Verify stack trace shows correct call chain
Suggested test approach:
- Create a test application
app/exception_test.c - Include functions that deliberately trigger each exception type
- Create nested function calls to test stack trace
- Verify output format and completeness
- Can be manually invoked for verification during development
Documentation
A separate debugging guide will be created (Documentation/exception-debugging-guide.md) to provide:
- Common causes for each exception type
- Debugging strategies and techniques
- Example scenarios and solutions
- How to interpret stack traces
- Tips for using exception information effectively
This keeps runtime overhead minimal while providing comprehensive reference material for developers.
Implementation Notes
Memory Access Safety:
The stack walking implementation uses helper functions (is_valid_stack_pointer(), read_stack_frame(), is_valid_instruction_pointer()) to encapsulate validation logic. These functions should be expanded based on Linmo's actual memory layout and MMU configuration:
- Define
STACK_BASEandSTACK_SIZEbased on architecture - Implement proper bounds checking
- Consider adding page fault handling for robust memory access
Heap Information:
The print_system_state() function expects a get_heap_stats() callback. This should be implemented by the allocator module. If unavailable, the function gracefully falls back to a message indicating the allocator is not available.
Related Issues
This enhancement is part of a larger debugging infrastructure effort. A complementary feature for runtime event tracing will be proposed separately:
- Event Tracing System for Kernel Development #25 Records historical system events for analyzing complex timing and interaction issues
These features are independent but work together to provide comprehensive debugging capabilities.
Acceptance Criteria
- All exception types (exc_msg[0-15]) have meaningful descriptions
- Exception output includes task ID, PC, mtval, mstatus, and SP
- Stack trace shows function call chain (when compiled with frame pointer)
- Stack bounds validation prevents invalid memory access
- System state snapshot shows all tasks and memory usage
- Output is well-formatted and easy to read
- No performance impact during normal execution
- Helper functions keep
do_trap()clean and maintainable - Debugging guide document is created