Support SMP scheduling and synchronization #6

visitorckw · 2025-06-29T16:35:22Z

Enable SMP support by adding multi-core QEMU simulation and adapting
RISC-V boot for per-hart stacks and timers. It replaces interrupt
masking with spinlocks in core kernel subsystems for safe concurrency,
updates the kernel control block to track per-hart current tasks, and
spawns idle tasks at boot to prevent panics. Spinlock-based
synchronization replaces hart parking during boot, and printf output is
protected against interleaving. A spinlock implementation using RV32A
atomics is also included. These changes enable stable multi-core
operation.

visitorckw · 2025-06-29T16:36:04Z

Marking this PR as a draft due to test failures in several applications, but I still appreciate any reviews or suggestions.

Mes0903 · 2025-08-06T22:04:07Z

I've run a quick test using the hello program:

mes@MesDesktop:~/linmo$ make run
Ready to launch Linmo kernel + application.
Linmo kernel is starting...
Heap initialized, 826110031 bytes available
task 1: entry=800002e4 stack=80002e88 size=6904
task 2: entry=800002a8 stack=80003f40 size=6904
task 3: entry=8000023c stack=80004fc4 size=6904
task0 has id 1
task1 has id 2
task2 has id 3
Scheduler mode: Preemptive
task 4: entry=800016fc stack=800060b4 size=6904
task 6: entry=800016fc stack=8000819c size=6904
task 7: entry=800016fc stack=800091a4 size=6904
task 5: entry=800016fc stack=80007128 size=6904
[task 1 000001]
[task 1 100001]
[task 3 000003 - sys uptime: 0.071s]
[task 3 100003 - sys uptime: 0.062s]
[task 1 200001]
[task 3 200003 - sys uptime: 0.064s]
[task 1 300001]
[task 2 000002]
[task 3 300003 - sys uptime: 0.075s]
[task 1 400001]
[task 3 400003 - sys uptime: 0.077s]
[task 1 500001]
[task 3 500003 - sys uptime: 0.079s]
[task 2 100002]
[task 1 600001]
[task 3 600003 - sys uptime: 0.611s]
[task 1 700001]
[task 1 700001]
[task 3 700003 - sys uptime: 0.631s]
[task 1 800001]
[task 2 200002]
[task 2 200002]
[task 3 800003 - sys uptime: 0.751s]
[task 1 900001]
[task 3 900003 - sys uptime: 0.671s]
[EXCEPTION] code=1 (Instruction access fault), epc=0000031e, cause=00000001

Here's another test run with the same program:

mes@MesDesktop:~/linmo-main$ make run
Ready to launch Linmo kernel + application.

Linmo kernel is starting...
Heap initialized, 826110031 bytes available
task 1: entry=800002e4 stack=80002e88 size=6904
task 2: entry=800002a8 stack=80003f40 size=6904
task 3: entry=8000023c stack=80004fc4 size=6904
task0 has id 1
task1 has id 2
task2 has id 3
Scheduler mode: Preemptive
task 4: entry=800016fc stack=8000618c size=6904
task 5: entry=800016fc stack=800081ac size=6904
task 6: entry=800016fc stack=800091b4 size=6904
task 7: entry=800016fc stack=80007194 size=6904
[task 1 000001]
[task 1 100001]
[task 3 000003 - sys uptime: 0.071s]
[task 0 005 - sys uptime: 0.000s]

After I changed the -smp parameter to 1, I haven’t encountered any more crashes:

mes@MesDesktop:~/linmo$ make run
Ready to launch Linmo kernel + application.
Linmo kernel is starting...
Heap initialized, 826110031 bytes available
task 1: entry=800002e4 stack=80002e88 size=6904
task 2: entry=800002a8 stack=80003f40 size=6904
task 3: entry=8000023c stack=80004fc4 size=6904
task0 has id 1
task1 has id 2
task2 has id 3
Scheduler mode: Preemptive
task 4: entry=800016fc stack=80006048 size=6904
[task 1 000001]
[task 3 000003 - sys uptime: 0.041s]
[task 1 100001]
[task 3 100003 - sys uptime: 0.044s]
[task 1 200001]
[task 3 200003 - sys uptime: 0.047s]
[task 1 300001]
[task 3 300003 - sys uptime: 0.401s]
[task 1 400001]
[task 3 400003 - sys uptime: 0.431s]

However, I still think the crash could happen. From the logs above, I noticed that the local variable values in hello.c were incorrect. For example, in [task 1 000001], after incrementing cnt, it became [task 1 100001]. I also saw an unexpected task 0.

Since the variable values don’t match the expected control flow, I suspect the entire memory layout is corrupted. Also, we specified the memory size as 128MB, yet the output shows a heap size of 826,110,031 bytes, which is clearly incorrect.

To verify this, I modified the task functions in hello.c as follows:

void task0(void)
{
    /* Add assembly markers to track where execution actually starts */
    asm volatile("nop; nop; nop; nop" ::: "memory"); /* Marker 1 */

    printf("DEBUG task0: Starting function execution\n");

    int32_t cnt = 100000;

    asm volatile("nop; nop; nop; nop" ::: "memory"); /* Marker 2 */

    printf("DEBUG task0: After variable init, cnt=%ld\n", cnt);

    /* Check if cnt got corrupted immediately */
    if (cnt != 100000) {
        printf(
            "ERROR task0: cnt corrupted immediately! Expected 100000, got "
            "%ld\n",
            cnt);
        /* Try to see what's at that memory location */
        printf("DEBUG task0: Memory at &cnt=%p contains: %08x\n", &cnt,
               *(uint32_t *) &cnt);
    }

    /* Debug: Print stack pointer and validate stack region */
    uint32_t current_sp;
    asm volatile("mv %0, sp" : "=r"(current_sp));
    printf("DEBUG task0: SP=%p, &cnt=%p, distance=%d\n", (void *) current_sp,
           &cnt, (int) ((char *) &cnt - (char *) current_sp));

    while (1) {
        printf("[task %d %ld]\n", mo_task_id(), cnt++);
        mo_task_wfi();
    }
}

void task1(void)
{
    asm volatile("nop; nop; nop; nop" ::: "memory");
    printf("DEBUG task1: Starting function execution\n");

    int32_t cnt = 200000;

    asm volatile("nop; nop; nop; nop" ::: "memory");
    printf("DEBUG task1: After variable init, cnt=%ld\n", cnt);

    if (cnt != 200000) {
        printf("ERROR task1: cnt corrupted! Expected 200000, got %ld\n", cnt);
        printf("DEBUG task1: Memory at &cnt=%p contains: %08x\n", &cnt,
               *(uint32_t *) &cnt);
    }

    while (1) {
        printf("[task %d %ld]\n", mo_task_id(), cnt++);
        mo_task_wfi();
    }
}

void task2(void)
{
    asm volatile("nop; nop; nop; nop" ::: "memory");
    printf("DEBUG task2: Starting function execution\n");

    int32_t cnt = 300000;
    uint32_t secs, msecs, time;

    asm volatile("nop; nop; nop; nop" ::: "memory");
    printf("DEBUG task2: After variable init, cnt=%ld\n", cnt);

    if (cnt != 300000) {
        printf("ERROR task2: cnt corrupted! Expected 300000, got %ld\n", cnt);
        printf("DEBUG task2: Memory at &cnt=%p contains: %08x\n", &cnt,
               *(uint32_t *) &cnt);
    }

    while (1) {
        time = mo_uptime();
        secs = time / 1000;
        msecs = time - secs * 1000;
        printf("[mhartid: %d, task %d %ld - sys uptime: %ld.%03lds]\n",
               (int) read_csr(mhartid), mo_task_id(), cnt++, secs, msecs);
        mo_task_wfi();
    }
}

And here's the output under -smp 1:

mes@MesDesktop:~/linmo$ make run
Ready to launch Linmo kernel + application.
Linmo kernel is starting...
Heap initialized, 026010031 bytes available
task 1: entry=8000023c stack=80003278 size=6904
task 2: entry=80000300 stack=80004330 size=6904
task 3: entry=800003a8 stack=800053b4 size=6904
task0 has id 1
task1 has id 2
task2 has id 3
Scheduler mode: Preemptive
task 4: entry=80001864 stack=80006438 size=6904
DEBUG task0: Starting function execution
DEBUG task0: After variable init, cnt=000001
DEBUG task0: SP=800041d0, &cnt=800041dc, distance=21
[task 1 000001]
DEBUG task2: Starting function execution
DEBUG task2: After variable init, cnt=000003
[mhartid: 0, task 3 000003 - sys uptime: 0.041s]
[task 1 100001]
[mhartid: 0, task 3 100003 - sys uptime: 0.034s]
[task 1 200001]
[mhartid: 0, task 3 200003 - sys uptime: 0.037s]
[task 1 300001]
[mhartid: 0, task 3 300003 - sys uptime: 0.301s]
[task 1 400001]
[mhartid: 0, task 3 400003 - sys uptime: 0.331s]
[task 1 500001]
DEBUG task1: Starting function execution
DEBUG task1: After variable init, cnt=000002
[task 2 000002]
[mhartid: 0, task 3 500003 - sys uptime: 0.371s]
[task 1 600001]
[mhartid: 0, task 3 600003 - sys uptime: 0.302s]
[task 1 700001]
[mhartid: 0, task 3 700003 - sys uptime: 0.332s]
[task 1 800001]
[mhartid: 0, task 3 800003 - sys uptime: 0.362s]

We can see that in the line DEBUG task0: After variable init, cnt=000001 that the initialized value of cnt is incorrect, which should be 100000.

Therefore, I suspect the problem lies in the memory layout, though I haven’t found the root cause yet. My guess is that something went wrong either during the context switch, or during the initial function stack setup. Or maybe the endianness...?

visitorckw · 2025-08-07T03:40:20Z

Yes, each smp processor needs its own stack to work correctly. But even with that fixed, the current code still seems problematic, so I'm still working on it...

Introduce a simple spinlock implementation based on test-and-set using RV32A atomic instructions. The spinlock API includes basic locking, IRQ-safe variants, and versions that save and restore interrupt state. To support atomic instructions, the Makefile is updated to enable the 'A' extension by changing the -march flag. This is the first step toward enabling multi-core task scheduling support on RISC-V SMP systems.

The original malloc/free implementation used CRITICAL_ENTER() and CRITICAL_LEAVE() to protect critical sections by simply disabling interrupts, which is sufficient on single-core systems. To support SMP, we replace these with a proper spinlock that uses RV32A atomic instructions. This ensures correctness when multiple harts access the allocator concurrently. This change allows future task scheduling across multiple harts without risking race conditions in the memory allocator.

The original message queue implementation used CRITICAL_ENTER() and CRITICAL_LEAVE() to protect critical sections by disabling interrupts. This was sufficient for single-core systems, where only one hart could execute tasks. To support SMP, we replace these macros with a proper spinlock using RV32A atomic instructions. This ensures safe access to the internal queue structures when multiple harts concurrently interact with message queues. This change eliminates potential race conditions in message queue operations as we move toward multi-hart scheduling.

…pport The original task management code used CRITICAL_ENTER() / CRITICAL_LEAVE() and NOSCHED_ENTER() / NOSCHED_LEAVE() to protect critical sections by disabling interrupts, which was sufficient for single-core systems. To support SMP, these macros are replaced with a spinlock based on RV32A atomic instructions. This ensures that multiple harts can safely access and modify shared task data such as ready queues, priority values, and task control blocks. This change is essential for enabling multi-hart task scheduling without introducing race conditions in the kernel task subsystem.

The original pipe implementation used CRITICAL_ENTER() and CRITICAL_LEAVE() to protect critical sections by disabling interrupts, which was acceptable for single-core systems. To support SMP, these macros are replaced with a proper spinlock based on RV32A atomic instructions. This ensures safe concurrent access to the circular buffer used by the pipe, even when multiple harts are performing read or write operations simultaneously. This change is necessary to avoid race conditions and ensure correct pipe behavior under multi-hart task scheduling.

The original semaphore implementation used NOSCHED_ENTER() and NOSCHED_LEAVE() to protect critical sections by disabling interrupts, which was sufficient in single-core environments. To support SMP, we replace these macros with a spinlock based on RV32A atomic instructions. This ensures safe access to shared semaphore state, including the count and wait queue, when multiple harts operate concurrently. This change is necessary to avoid race conditions during mo_sem_wait(), mo_sem_signal(), and other semaphore operations under multi-hart scheduling.

The timer subsystem originally used NOSCHED_ENTER() and NOSCHED_LEAVE() to disable interrupts when accessing shared timer state, which sufficed on single-core systems. To support SMP, we now replace these macros with a spinlock based on RV32A atomic instructions. This ensures safe concurrent access to global timer state such as timer_initialized, the timer list, and ID management. This change prepares the timer subsystem for correct operation when multiple harts simultaneously create, start, or cancel timers.

The mutex and condition variable implementation previously relied on NOSCHED_ENTER() and NOSCHED_LEAVE() to protect critical sections by disabling interrupts. This works in single-core environments but breaks down under SMP due to race conditions between harts. This patch replaces those macros with a spinlock built using RV32A atomic instructions. The spinlock protects access to shared state including mutex ownership, waiter lists, and condition wait queues. This change ensures correct mutual exclusion and atomicity when multiple harts concurrently lock/unlock mutexes or signal condition variables.

On SMP systems, concurrent calls to printf() from multiple harts can cause interleaved and unreadable output due to racing writes to the shared output buffer. Add a spinlock to serialize access to printf(), ensuring that only one hart writes at a time. This change improves the readability of debug messages and prevents garbled output when multiple harts are active.

All calls to NOSCHED_ENTER(), NOSCHED_LEAVE(), CRITICAL_ENTER(), and CRITICAL_LEAVE() have been replaced with spinlock-based synchronization primitives throughout the kernel. As a result, these macros are no longer used and have been removed from include/sys/task.h to clean up the codebase and avoid confusion.

To support SMP, allocate separate stack memory regions for each hart during boot. This patch modifies the assembly entry code in arch/riscv/boot.c to compute the initial stack pointer based on the hart ID, ensuring each hart uses a distinct stack area of fixed size (STACK_SIZE_PER_HART). This enables multiple harts to safely run concurrently without stack collisions during early boot stages.

Remove the old logic that parks all secondary harts in WFI, which caused them to hang indefinitely. Instead, all harts proceed with boot. To ensure proper initialization sequence, hart 0 performs hardware setup, heap initialization, and task creation. Other harts spin-wait on a spinlock-protected flag until hart 0 finishes initialization before starting task dispatch.

The task_lock spinlock was primarily used to protect access to the Kernel Control Block (kcb) and its internal data structures. Move the spinlock into the kcb_t struct as kcb_lock, consolidating related state and synchronization primitives together. All uses of the standalone task_lock spinlock are replaced by kcb->kcb_lock accesses, improving code clarity and encapsulation of the kernel's core control block.

To prevent kernel panic during startup when some harts may not have any runnable tasks assigned, add an idle task for each hart. The idle task runs an infinite loop calling mo_task_wfi(), ensuring the hart remains in a low-power wait state instead of causing a panic due to lack of tasks. This guarantees that every hart has at least one task to execute immediately after boot, improving system robustness and stability on SMP setups.

Previously, only a single global pointer tracked the current running task, which worked for single-core systems. To support SMP, change the Kernel Control Block (KCB) to maintain an array of current task pointers, one per hart. Added get_task_current() and set_task_current() helper functions to retrieve and update the current task for the executing hart. Modify kernel and HAL code to use these new functions instead of the single global current task pointer, ensuring correct task tracking on each hart.

Since kcb->ticks is shared and updated by all cores, add a spinlock to protect its increment operation in the dispatcher, ensuring atomicity and preventing race conditions in SMP environments.

Previously, mtimecmp was accessed at a fixed MMIO address assuming a single core. Each hart has its own mtimecmp register at distinct offsets, so update mtimecmp read and write functions to index based on the current hart ID, ensuring correct timer compare handling in SMP systems.

Enable running the kernel on 4 simulated cores by passing the -smp 4 parameter to qemu-system-riscv32, facilitating SMP testing and development.

jserv mentioned this pull request Jul 4, 2025

Enhance task scheduler #13

Open

34 tasks

visitorckw force-pushed the support-smp branch from 933c254 to b735dd9 Compare September 6, 2025 19:20

visitorckw added 19 commits September 8, 2025 03:28

Protect shared kcb->ticks with spinlock

be43db3

Since kcb->ticks is shared and updated by all cores, add a spinlock to protect its increment operation in the dispatcher, ensuring atomicity and preventing race conditions in SMP environments.

Add -smp 4 to QEMU run command for multi-core simulation

49f9119

Enable running the kernel on 4 simulated cores by passing the -smp 4 parameter to qemu-system-riscv32, facilitating SMP testing and development.

Add spinlock protection for kcb

aad0525

visitorckw force-pushed the support-smp branch from b735dd9 to aad0525 Compare September 7, 2025 19:29

vicLin8712 mentioned this pull request Oct 2, 2025

Single-Hart O(1) Enhancement #23

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support SMP scheduling and synchronization #6

Support SMP scheduling and synchronization #6

visitorckw commented Jun 29, 2025

Uh oh!

visitorckw commented Jun 29, 2025

Uh oh!

Mes0903 commented Aug 6, 2025 •

edited

Loading

Uh oh!

visitorckw commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support SMP scheduling and synchronization #6

Are you sure you want to change the base?

Support SMP scheduling and synchronization #6

Conversation

visitorckw commented Jun 29, 2025

Uh oh!

visitorckw commented Jun 29, 2025

Uh oh!

Mes0903 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

visitorckw commented Aug 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Mes0903 commented Aug 6, 2025 •

edited

Loading