sysprog21
diff --git a/‎Documentation/hal-riscv-context-switch.md
Lines changed: 179 additions & 38 deletions b/‎Documentation/hal-riscv-context-switch.md
Lines changed: 179 additions & 38 deletions
@@ -1,41 +1,182 @@
 # HAL: Context Switching for RISC-V
 
 ## Context Switching
-
-Context switching is essential to Linmo's preemptive multitasking kernel, facilitating smooth task transitions. In Linmo, this is managed through `setjmp` and `longjmp` functions, implemented in [arch/riscv/hal.c](../arch/riscv/hal.c) and declared in [arch/riscv/hal.h](../arch/riscv/hal.h). These functions enable the kernel to save and restore task states, supporting reliable multitasking. The process involves unique considerations due to the non-standard use of these functions, requiring careful handling to ensure system stability.
-
-### Repurposed setjmp and longjmp
-
-The `setjmp` and `longjmp` functions, typically used for exception handling, are repurposed in Linmo for context switching. They save additional states beyond standard registers, including CSRs like `mcause`, `mepc`, and `mstatus`, which may lead to unexpected behavior if not properly managed. During timer interrupts, `setjmp` must handle `mstatus.MIE` being cleared, relying on `MPIE` reconstruction, a departure from their usual role. Similarly, `longjmp` restores context without reinitializing local variables or the call stack, posing risks of resource leaks in tasks with dynamic memory. These deviations demand precise implementation to maintain task switching integrity.
-
-### Context Switch Process
-
-1. Save Current Task State: The `setjmp` function captures the current task's CPU state, storing it in a `jmp_buf` structure defined as `uint32_t jmp_buf[19]` in [arch/riscv/hal.h](../arch/riscv/hal.h). This includes callee-saved general-purpose registers (such as `s0` to `s11`), essential pointers (`gp`, `tp`, `sp`, `ra`), and CSRs (`mcause`, `mepc`, `mstatus`). The layout is defined by `CONTEXT_*` macros in [arch/riscv/hal.c](../arch/riscv/hal.c). Failing to reconstruct `mstatus` from `MPIE` during this step can cause incorrect interrupt settings, leading to missed timer interrupts or stalls, which is mitigated by ensuring proper `mstatus` reconstruction.
-
-2. Select Next Task: The scheduler, invoked via `dispatcher()` during a machine timer interrupt, selects the next task using a priority-based round-robin algorithm or a user-defined scheduler. This evaluates task priorities and readiness, ensuring system responsiveness. The process relies on accurate `mstatus` restoration to avoid timing issues.
-
-3. Restore Next Task State: The `longjmp` function restores the CPU state from the selected task’s `jmp_buf`, resuming execution where `setjmp` was called. For new tasks, `hal_context_init` initializes the `jmp_buf` with `sp`, `ra`, and `mstatus` set to `MSTATUS_MIE` and `MSTATUS_MPP_MACH`, while `hal_dispatch_init` launches the task. The `hal_interrupt_tick` function enables interrupts (`_ei()`) after the first task, ensuring a consistent environment. Premature restoration of `mstatus` can disrupt scheduling, addressed by prioritizing it before other registers.
-
-## Machine Status Management
-
-The machine status in RISC-V is managed through the `mstatus` CSR, which controls critical system states, such as the Machine Interrupt Enable (`MIE`) bit. Proper handling of `mstatus` during context switching is essential to maintain correct interrupt behavior and ensure tasks execute as expected.
-
-### Role of `mstatus`
-
-The `mstatus` register includes the `MIE` bit (bit 3), which enables or disables machine-mode interrupts globally, and the `MPIE` bit (bit 7), which preserves the previous interrupt enable state during a trap, allowing `MIE` to be reconstructed afterward. Other fields, such as those for privilege mode and memory protection, are less relevant in Linmo’s single-address-space model but must still be preserved. The `hal_interrupt_set` function in [arch/riscv/hal.h](../arch/riscv/hal.h) manipulates `MIE` using `read_csr(mstatus)` and `write_csr(mstatus, val)`, with convenience macros `_di()` (disable interrupts) and `_ei()` (enable interrupts).
-
-### Saving and Restoring `mstatus`
-
-1. Saving in `setjmp`: The `setjmp` function reads `mstatus` using the `csrr` instruction. Since `mstatus.MIE` is cleared during a trap, the previous interrupt state is reconstructed from `MPIE` by shifting bit 7 to bit 3, clearing the original `MIE`, and restoring it, then storing the result in `jmp_buf` at `CONTEXT_MSTATUS` (18). This ensures accurate interrupt context preservation, preventing issues from incorrect handling.
-
-2. Restoring in `longjmp`: The `longjmp` function loads `mstatus` from `jmp_buf` and writes it back with `csrw` before other registers, ensuring early interrupt state establishment. This prevents premature interrupt handling and maintains consistency.
-
-3. Timing and drift considerations: Incorrect `mstatus` restoration can disrupt scheduling by enabling interrupts at the wrong time, resolvable through testing with scheduler-stressing applications like message queues. Timer interrupt drift, if not handled carefully, is addressed by `do_trap` scheduling relative to the previous `mtimecmp` value, requiring `mstatus.MIE` to be enabled for effective reception.
-
-### Best Practices for Machine Status
-
-- Prioritize `mstatus` restoration: Restore `mstatus` before other registers in `longjmp` to establish the correct interrupt state early.
-
-- Use safe CSR access: Use `read_csr` and `write_csr` macros in [arch/riscv/hal.h](../arch/riscv/hal.h) for consistent CSR manipulation.
-
-- Initialize `mstatus` for new tasks: Set `MSTATUS_MIE` and `MSTATUS_MPP_MACH` in `hal_context_init` to ensure new tasks start with interrupts enabled.
+Context switching is essential to Linmo's preemptive multitasking kernel,
+facilitating smooth task transitions.
+In Linmo, context switching is implemented through a clean separation of concerns architecture that combines the portability of standard C library functions with the performance requirements of real-time systems.
+This approach provides both `setjmp` and `longjmp` functions following standard C library semantics for application use,
+and dedicated HAL routines (`hal_context_save` and `hal_context_restore`) for optimized kernel scheduling.
+
+## Separation of Concerns Architecture
+The context switching implementation follows a clean layered approach that separates execution context management from processor state management:
+
+### Standard C Library Layer
+Portable, standards-compliant context switching for applications
+- `setjmp` - Saves execution context only (elements 0-15)
+- `longjmp` - Restores execution context only
+- Semantics: Pure C library behavior, no processor state management
+- Use Cases: Exception handling, coroutines, application-level control flow
+- Performance: Standard overhead, optimized for portability
+
+### HAL Context Switching Layer
+Context switching for kernel scheduling
+- `hal_context_save` - Saves execution context AND processor state
+- `hal_context_restore` - Restores complete task state
+- Semantics: System-level optimization with interrupt state management
+- Use Cases: Preemptive scheduling, cooperative task switching
+- Performance: Optimized for minimal overhead
+
+### Unified Context Buffer
+Both layers use the same `jmp_buf` structure but access different portions:
+
+```c
+typedef uint32_t jmp_buf[17];
+
+/* Layout:
+ * [0-11]:  s0-s11 (callee-saved registers) - both layers
+ * [12]:    gp (global pointer) - both layers
+ * [13]:    tp (thread pointer) - both layers
+ * [14]:    sp (stack pointer) - both layers
+ * [15]:    ra (return address) - both layers
+ * [16]:    mstatus (processor state) - HAL layer only
+ */
+```
+
+## Context Switch Process
+
+### 1. Save Current Task State
+The `hal_context_save` function captures complete task state including both execution context and processor state.
+The function saves all callee-saved registers as required by the RISC-V ABI,
+plus essential pointers (gp, tp, sp, ra).
+For processor state, it performs sophisticated interrupt state reconstruction:
+
+```c
+/* mstatus reconstruction during timer interrupts */
+csrr t0, mstatus        // Read current mstatus (MIE=0 in trap)
+srli t1, t0, 4          // Shift MPIE (bit 7) to bit 3 position
+andi t1, t1, 8          // Isolate the reconstructed MIE bit
+li   t2, ~8             // Create mask to clear old MIE bit
+and  t0, t0, t2         // Clear the current MIE bit
+or   t0, t0, t1         // Set MIE to pre-trap value (from MPIE)
+sw   t0, 16*4(%0)       // Store in jmp_buf[16]
+```
+
+This ensures that tasks resume with correct interrupt state,
+maintaining system responsiveness and preventing interrupt state corruption.
+
+### 2. Select Next Task
+The scheduler, invoked via `dispatcher()` during machine timer interrupts,
+uses a priority-based round-robin algorithm or user-defined scheduler to select the next ready task.
+The scheduling logic evaluates task priorities and readiness states to ensure optimal system responsiveness.
+
+### 3. Restore Next Task State
+The `hal_context_restore` function performs complete state restoration with processor state restored first to establish correct execution environment:
+
+```c
+lw  t0, 16*4(%0)        // Load saved mstatus from jmp_buf[16]
+csrw mstatus, t0        // Restore processor state FIRST
+// ... then restore all execution context registers
+```
+
+This ordering ensures that interrupt state and privilege mode are correctly established before resuming task execution.
+
+## Processor State Management
+
+### Interrupt State Reconstruction
+The HAL context switching routines include sophisticated interrupt state management that handles the complexities of RISC-V interrupt processing:
+
+During Timer Interrupts:
+- `mstatus.MIE` is automatically cleared by hardware when entering the trap
+- `mstatus.MPIE` preserves the previous interrupt enable state
+- HAL functions reconstruct the original interrupt state from `MPIE`
+- This ensures consistent interrupt behavior across context switches
+
+State Preservation:
+- Each task maintains its own interrupt enable state
+- Context switches preserve privilege mode (Machine mode for kernel tasks)
+- Interrupt state is reconstructed accurately for reliable task resumption
+
+### Task Initialization
+New tasks are initialized with proper processor state:
+
+```c
+void hal_context_init(jmp_buf *ctx, size_t sp, size_t ss, size_t ra)
+{
+    /* Set execution context */
+    (*ctx)[CONTEXT_SP] = (uint32_t) stack_top;  // Stack pointer
+    (*ctx)[CONTEXT_RA] = (uint32_t) ra;         // Entry point
+    /* Set processor state */
+    (*ctx)[CONTEXT_MSTATUS] = MSTATUS_MIE | MSTATUS_MPP_MACH;
+}
+```
+
+This ensures new tasks start with interrupts enabled in machine mode.
+
+## Implementation Details
+
+### Kernel Integration
+The kernel scheduler uses the context switching routine:
+
+```c
+/* Preemptive context switching */
+void dispatch(void)
+{
+    /* Save current task with processor state */
+    if (hal_context_save(current_task->context) != 0)
+        return;  /* Restored from context switch */
+ 
+    /* ... scheduling logic ... */
+
+    /* Restore next task with processor state*/
+    hal_context_restore(next_task->context, 1);
+}
+```
+
+## Best Practices
+
+### Architecture Principles
+- Layer Separation: Keep application and system contexts separate
+- Standard Compliance: Use standard functions for portable code
+- Performance Optimization: Use HAL functions for time-critical system code
+- State Management: Let HAL functions handle processor state automatically
+
+### Development Guidelines
+- Application Code: Always use `setjmp` and `longjmp` for exception handling
+- System Code: Always use `hal_context_save` and `hal_context_restore` for scheduling
+- Mixed Use: Both can operate on the same `jmp_buf` without interference
+- Testing: Verify interrupt state preservation across context switches
+
+### Interrupt State Machinery
+RISC-V Interrupt Behavior During Traps:
+1. Hardware automatically clears `mstatus.MIE` on trap entry
+2. Previous interrupt state saved in `mstatus.MPIE`
+3. Privilege level preserved in `mstatus.MPP`
+4. HAL functions reconstruct original interrupt state for task resumption
+
+State Reconstruction Logic:
+```
+Original MIE state = Current MPIE bit
+Reconstructed mstatus = (current_mstatus & ~MIE) | (MPIE >> 4)
+```
+
+This ensures tasks resume with their original interrupt enable state rather than the disabled state from trap entry.
+
+### Task State Lifecycle
+New Task Creation:
+1. `hal_context_init` sets up initial execution context
+2. Stack pointer positioned with ISR frame reservation
+3. Return address points to task entry function
+4. Processor state initialized with interrupts enabled
+
+First Task Launch:
+1. `hal_dispatch_init` transfers control from kernel to first task
+2. Global interrupts enabled just before task execution
+3. Timer interrupts activated for preemptive scheduling
+4. Task begins execution at its entry point
+
+Context Switch Cycle:
+1. Timer interrupt triggers scheduler entry
+2. `hal_context_save` preserves complete current task state
+3. Scheduler selects next ready task based on priority
+4. `hal_context_restore` resumes selected task execution
+5. Task continues from its previous suspension point