Skip to content

Commit dcdcd8e

Browse files
committed
[docs] Add smp boot docs for aarch64
1 parent cbf3f8c commit dcdcd8e

File tree

3 files changed

+273
-0
lines changed

3 files changed

+273
-0
lines changed

documentation/3.kernel/INDEX.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@
99
- @subpage page_memory_management
1010
- @subpage page_interrupt_management
1111
- @subpage page_kernel_porting
12+
- @subpage page_kernel_smp_boot
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
@page page_kernel_smp_boot QEMU virt64 AArch64 SMP Boot Flow
2+
3+
# QEMU virt64 AArch64 SMP Boot Flow
4+
5+
This guide walks through the multi-core boot path of RT-Thread on AArch64 using `bsp/qemu-virt64-aarch64` as the concrete reference. It is written to be beginner-friendly and mirrors the current BSP implementation: from `_start` assembly, early MMU bring-up, `rtthread_startup()`, PSCI wakeup of secondary cores, to all CPUs entering the scheduler. The original PlantUML diagram is replaced by Mermaid so it renders directly on GitHub.
6+
7+
- Target setup: QEMU `-machine virt`, `-cpu cortex-a57`, `-smp >=2`, `RT_USING_SMP` enabled, device tree contains `enable-method = "psci"`.
8+
- Goal: Know who does what, where the code lives, and what to check when SMP does not come up.
9+
10+
## Big Picture First
11+
12+
```mermaid
13+
flowchart TD
14+
ROM[BootROM/BL1<br/>QEMU firmware] --> START[_start<br/>(entry_point.S)]
15+
START --> MMU[init_mmu_early<br/>enable_mmu_early]
16+
MMU --> CBOOT[rtthread_startup()]
17+
CBOOT --> BOARD[rt_hw_board_init<br/>-> rt_hw_common_setup]
18+
BOARD --> MAIN[main_thread_entry]
19+
MAIN --> PSCI[rt_hw_secondary_cpu_up<br/>(PSCI CPU_ON)]
20+
PSCI --> SECASM[_secondary_cpu_entry<br/>(ASM)]
21+
SECASM --> SECC[rt_hw_secondary_cpu_bsp_start]
22+
SECC --> SCHED[rt_system_scheduler_start]
23+
SCHED --> RUN[SMP scheduling]
24+
```
25+
26+
## Boot CPU: from `_start` to MMU on
27+
28+
**Input registers**: QEMU firmware loads the image and jumps to `_start` at `libcpu/aarch64/cortex-a/entry_point.S`, passing the DTB physical address in `x0` (with `x1~x3` reserved).
29+
30+
**What `_start` does (short version)**
31+
32+
1. Clear thread pointers: zero `tpidr_el1/tpidrro_el0` to avoid stale per-cpu state.
33+
2. Unify exception level: `init_cpu_el` drops to EL1h, enables timer access, masks unwanted traps.
34+
3. Clear BSS: `init_kernel_bss` fills `__bss` with zeros so globals start clean.
35+
4. Prepare stack: `init_cpu_stack_early` switches to SP_EL1 and uses `.boot_cpu_stack_top` as the early stack.
36+
5. Remember the FDT: `rt_hw_fdt_install_early(x0)` stores DTB address/size before MMU is enabled.
37+
6. Early MMU mapping: `init_mmu_early`/`enable_mmu_early` build a 0~1G identity map, set TTBR0/TTBR1 and SCTLR_EL1, flush I/D cache and TLB, then branch to `rtthread_startup()` (address in x8).
38+
39+
> Tip: the early page table only covers minimal kernel space; the C phase will remap a fuller layout.
40+
41+
## C-side startup backbone
42+
43+
`rtthread_startup()` (in `src/components.c`) is the spine of the sequence:
44+
45+
- **Interrupts off + spinlock ready**: `rt_hw_local_irq_disable()` followed by `_cpus_lock` init to keep early steps non-preemptible.
46+
- **Board init**: `rt_hw_board_init()` directly calls the BSP hook `rt_hw_common_setup()` (`libcpu/aarch64/common/setup.c`) to:
47+
- set VBAR, build kernel address space, copy DTB to a safe region and pre-parse it;
48+
- configure MMU mappings; init memblock/page allocator/system heap;
49+
- parse DT for console, memory, initrd;
50+
- init GIC (and GICv3 Redistributor if enabled), UART, global GTIMER;
51+
- install SMP IPIs (`RT_SCHEDULE_IPI`, `RT_STOP_IPI`, `RT_SMP_CALL_IPI`) and unmask them;
52+
- set idle hook `rt_hw_idle_wfi` so idle CPUs enter low-power wait.
53+
- **Kernel subsystems**: init system timer, scheduler, signals, and create main/timer/idle threads.
54+
- **Start scheduling**: `rt_system_scheduler_start()` runs `main_thread_entry()` first.
55+
56+
## How secondary cores are brought up
57+
58+
`main_thread_entry()` calls `rt_hw_secondary_cpu_up()` before invoking user `main()`, so all CPUs join scheduling.
59+
60+
### What `rt_hw_secondary_cpu_up()` does
61+
62+
1. Convert `_secondary_cpu_entry` to a physical address via `rt_kmem_v2p()`—the real entry the firmware jumps to.
63+
2. Walk CPU nodes recorded at boot (`cpu_info_init()` stored DTB info in `cpu_np[]` and `rt_cpu_mpidr_table[]`).
64+
3. Read `enable-method`:
65+
- QEMU virt64: `"psci"` → use `cpu_psci_ops.cpu_boot()` to issue `CPU_ON(target, entry)` to firmware.
66+
- Legacy compatibility: `"spin-table"` → write `cpu-release-addr` and `sev` to wake.
67+
4. Any failure prints a warning but does not halt the boot flow, making diagnosis easier.
68+
69+
### What happens on a secondary core
70+
71+
- **Assembly entry `_secondary_cpu_entry`**:
72+
- Read `mpidr_el1`, compare with `rt_cpu_mpidr_table` to find the logical CPU id, store it back, and write it into `TPIDR` for per-cpu access.
73+
- Allocate its own stack by offsetting `ARCH_SECONDARY_CPU_STACK_SIZE` per core.
74+
- Re-run `init_cpu_el`/`init_cpu_stack_early`, reuse the same early MMU path, then branch to `rt_hw_secondary_cpu_bsp_start()`.
75+
76+
- **C-side handoff `rt_hw_secondary_cpu_bsp_start()`** (`libcpu/aarch64/common/setup.c`):
77+
- Reset VBAR and synchronize with the boot CPU via `_cpus_lock`.
78+
- Update this core's MPIDR entry and bind the shared `MMUTable`.
79+
- Init local vector table, GIC CPU interface (and GICv3 Redistributor if present), enable the local GTIMER.
80+
- Unmask the three SMP IPIs; re-calibrate `loops_per_tick` for microsecond delay if needed.
81+
- Call `rt_dm_secondary_cpu_init()` to register the CPU device, then enter the scheduler via `rt_system_scheduler_start()`.
82+
83+
### Timeline (Mermaid)
84+
85+
```mermaid
86+
sequenceDiagram
87+
participant ROM as BootROM/BL1
88+
participant START as _start (ASM)
89+
participant CBOOT as rtthread_startup
90+
participant MAIN as main_thread_entry
91+
participant FW as PSCI firmware
92+
participant SECASM as _secondary_cpu_entry
93+
participant SECC as rt_hw_secondary_cpu_bsp_start
94+
participant SCHED as Scheduler (all CPUs)
95+
96+
ROM->>START: x0=DTB, jump to _start
97+
START->>START: init_cpu_el / clear BSS / set stack
98+
START->>START: init_mmu_early + enable_mmu_early
99+
START-->>CBOOT: branch to rtthread_startup()
100+
CBOOT->>CBOOT: rt_hw_board_init -> rt_hw_common_setup
101+
CBOOT-->>SCHED: rt_system_scheduler_start()
102+
SCHED-->>MAIN: run main_thread_entry
103+
MAIN->>FW: rt_hw_secondary_cpu_up (CPU_ON)
104+
FW-->>SECASM: entry = _secondary_cpu_entry
105+
SECASM->>SECASM: stack/TPIDR/EL setup
106+
SECASM-->>SECC: enable_mmu_early -> rt_hw_secondary_cpu_bsp_start
107+
SECC->>SECC: local GIC/Timer/IPI init
108+
SECC-->>SCHED: rt_system_scheduler_start()
109+
SCHED-->>MAIN: continue main()
110+
SCHED-->>Others: SMP scheduling
111+
```
112+
113+
## Source map (where to read the code)
114+
115+
| Stage | File | Role |
116+
| --- | --- | --- |
117+
| Boot assembly | `libcpu/aarch64/cortex-a/entry_point.S` | `_start`, `_secondary_cpu_entry`, early MMU enable |
118+
| BSP hook | `bsp/qemu-virt64-aarch64/drivers/board.c` | Wires `rt_hw_board_init()` to `rt_hw_common_setup()` |
119+
| Memory/GIC/IPI init | `libcpu/aarch64/common/setup.c` | `rt_hw_common_setup()`, `rt_hw_secondary_cpu_up()`, `rt_hw_secondary_cpu_bsp_start()` |
120+
| C entry skeleton | `src/components.c` | `rtthread_startup()`, `main_thread_entry()` |
121+
122+
## Quick checks when SMP fails to come up
123+
124+
- Device tree: contains `enable-method = "psci"` and QEMU is started with `-machine virt` (PSCI firmware included).
125+
- `_secondary_cpu_entry` physical address: `rt_kmem_v2p()` must not return 0, otherwise a check fails.
126+
- Init order: GIC/Timer must be ready before calling `rt_hw_secondary_cpu_up()`; if you fork a custom BSP, do these first.
127+
- UART logs: look for `Call cpu X on success/failed`; add extra prints in `_secondary_cpu_entry` if needed, and use QEMU `-d cpu_reset -smp N` to debug.
128+
129+
## AArch64 pocket notes (just enough)
130+
131+
- **Exception levels**: startup may be at EL3/EL2; `init_cpu_el` descends to EL1h where the kernel runs.
132+
- **Two stack pointers**: `spsel #1` selects `SP_EL1` so user mode cannot touch the kernel stack.
133+
- **MMU bring-up order**: build page tables → configure TCR/TTBR → flush cache/TLB → set `SCTLR_EL1.M/C/I``isb`.
134+
- **MPIDR**: unique core affinity; stored in `rt_cpu_mpidr_table[]` to map logical CPU ids and IPI targets.
135+
136+
With these in place, the QEMU virt64 AArch64 BSP SMP path is clear: the boot CPU prepares memory and shared peripherals, `main_thread_entry()` issues PSCI wakeups, secondary cores land with the same MMU/EL setup, and all CPUs join the scheduler.
Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
@page page_kernel_smp_boot_zh QEMU virt64 AArch64 多核启动流程(中文)
2+
3+
# QEMU virt64 AArch64 多核启动流程
4+
5+
本文以 `bsp/qemu-virt64-aarch64` 为例,对 RT-Thread 在 AArch64 平台上的多核启动做一份“初学者友好”的拆解,覆盖从 `_start` 汇编、MMU 打开、`rtthread_startup()`,到 PSCI 唤醒次级核并全部进入调度器的完整链路。全文基于当前 BSP 的真实实现,顺手补全一些容易忽略的细节,并把原有 PlantUML 图改成可在 GitHub 直接渲染的 Mermaid。
6+
7+
- 适用环境:QEMU `-machine virt``-cpu cortex-a57``-smp >=2``RT_USING_SMP` 已开启,设备树包含 `enable-method = "psci"`
8+
- 读完你将能:看懂每一步是谁做的、代码在哪、如果多核没起来要检查什么。
9+
10+
## 全局先看一眼
11+
12+
```mermaid
13+
flowchart TD
14+
ROM[BootROM/BL1<br/>QEMU 固件] --> START[_start<br/>(entry_point.S)]
15+
START --> MMU[init_mmu_early<br/>enable_mmu_early]
16+
MMU --> CBOOT[rtthread_startup()]
17+
CBOOT --> BOARD[rt_hw_board_init<br/>-> rt_hw_common_setup]
18+
BOARD --> MAIN[main_thread_entry]
19+
MAIN --> PSCI[rt_hw_secondary_cpu_up<br/>(PSCI CPU_ON)]
20+
PSCI --> SECASM[_secondary_cpu_entry<br/>(ASM)]
21+
SECASM --> SECC[rt_hw_secondary_cpu_bsp_start]
22+
SECC --> SCHED[rt_system_scheduler_start]
23+
SCHED --> RUN[多核调度运行态]
24+
```
25+
26+
## Boot CPU:从 `_start` 到 MMU 打开
27+
28+
**输入参数**:QEMU 固件把镜像装入内存,跳到 `libcpu/aarch64/cortex-a/entry_point.S``_start`,同时 `x0` 带上 DTB 物理地址,`x1~x3` 预留。
29+
30+
**`_start` 做的事(精简版)**
31+
32+
1. 清理线程指针:`tpidr_el1/tpidrro_el0` 置零,避免继承旧状态。
33+
2. 异常级统一:`init_cpu_el` 把 CPU 拉到 EL1h,打开计时器访问,关掉不必要的陷入。
34+
3. BSS 清零:`init_kernel_bss` 循环写 0,保证全局变量干净。
35+
4. 栈准备:`init_cpu_stack_early` 切换到 SP_EL1,并使用链接脚本里的 `.boot_cpu_stack_top` 作为启动栈。
36+
5. 保存 FDT:`rt_hw_fdt_install_early(x0)` 在 MMU 开启前记录 DTB 起始地址和大小。
37+
6. MMU 早期映射:`init_mmu_early`/`enable_mmu_early` 建立 0~1G 恒等映射,设置 TTBR0/TTBR1、SCTLR_EL1,清理 I/D Cache 与 TLB,完成后跳转到 `rtthread_startup()`(寄存器 x8)。
38+
39+
> 小贴士:早期页表只够最小内核布局,后面会在 C 里重新映射更完整的空间。
40+
41+
## 进入 C 语言后的启动骨架
42+
43+
`rtthread_startup()``src/components.c`)是整条链路的骨干,关键点如下:
44+
45+
- **禁中断 + 自旋锁**:先 `rt_hw_local_irq_disable()`,再初始化 `_cpus_lock`,避免启动阶段被抢占。
46+
- **板级初始化**`rt_hw_board_init()` 直接调用 BSP 的 `rt_hw_common_setup()``libcpu/aarch64/common/setup.c`),完成:
47+
- 设置 VBAR(异常向量)、建立内核地址空间、拷贝 DTB 到安全内存并预解析;
48+
- 配置 MMU 映射、初始化 memblock/页分配器/系统堆;
49+
- 解析设备树:控制台、内存、initrd;
50+
- 初始化 GIC(或 GICv3 Redistributor)、UART、全局 GTIMER;
51+
- 安装 SMP IPI:`RT_SCHEDULE_IPI``RT_STOP_IPI``RT_SMP_CALL_IPI` 并解除屏蔽;
52+
- 设置空闲钩子 `rt_hw_idle_wfi`,保证空闲时进入低功耗等待。
53+
- **内核子系统**:初始化系统定时器、调度器、信号机制,创建 main/定时/空闲线程。
54+
- **调度器启动**`rt_system_scheduler_start()``main_thread_entry()` 首先运行。
55+
56+
## 次级核如何被拉起
57+
58+
`main_thread_entry()` 在调用用户 `main()` 前会执行 `rt_hw_secondary_cpu_up()`,确保所有 CPU 都进调度器。
59+
60+
### `rt_hw_secondary_cpu_up()` 做什么
61+
62+
1.`_secondary_cpu_entry` 转成物理地址(`rt_kmem_v2p()`),这是固件要跳转的真实入口。
63+
2. 遍历启动时记录的 CPU 节点(`cpu_info_init()` 已把 DTB 信息存进 `cpu_np[]``rt_cpu_mpidr_table[]`)。
64+
3. 读取 `enable-method`
65+
- QEMU virt64:`"psci"` → 走 `cpu_psci_ops.cpu_boot()`,向固件发 `CPU_ON(target, entry)`
66+
- 兼容老平台:`"spin-table"` → 写 `cpu-release-addr`,再 `sev` 唤醒。
67+
4. 任一核失败会打印 Warning,但主核流程不会被中断,便于后续排查。
68+
69+
### 发生在次级核的事
70+
71+
- **汇编入口 `_secondary_cpu_entry`**
72+
- 读取 `mpidr_el1`,和 `rt_cpu_mpidr_table` 比对确认逻辑核号并写回表项,随后将逻辑核号写入 `TPIDR`,便于 per-cpu 访问。
73+
-`ARCH_SECONDARY_CPU_STACK_SIZE` 为每个核分配独立栈。
74+
- 重复 `init_cpu_el``init_cpu_stack_early`,共用同一套早期 MMU 建表逻辑,最后跳到 `rt_hw_secondary_cpu_bsp_start()`
75+
76+
- **C 侧收尾 `rt_hw_secondary_cpu_bsp_start()`**`libcpu/aarch64/common/setup.c`):
77+
- 重新设置 VBAR,并持有 `_cpus_lock` 与主核同步。
78+
- 更新本核的 MPIDR 表项,绑定全局 `MMUTable`
79+
- 初始化本地向量表、GIC CPU 接口(和 GICv3 Redistributor,如果开启)、开启本地 GTIMER。
80+
- 解除三种 IPI 屏蔽,必要时重新校准 `loops_per_tick`(us 延时)。
81+
- 调用 `rt_dm_secondary_cpu_init()` 注册 CPU 设备,最后 `rt_system_scheduler_start()` 让该核进入调度。
82+
83+
### 时序图(Mermaid)
84+
85+
```mermaid
86+
sequenceDiagram
87+
participant ROM as BootROM/BL1
88+
participant START as _start (ASM)
89+
participant CBOOT as rtthread_startup
90+
participant MAIN as main_thread_entry
91+
participant FW as PSCI 固件
92+
participant SECASM as _secondary_cpu_entry
93+
participant SECC as rt_hw_secondary_cpu_bsp_start
94+
participant SCHED as Scheduler(全部CPU)
95+
96+
ROM->>START: x0=DTB,跳转 _start
97+
START->>START: init_cpu_el / 清 BSS / 设栈
98+
START->>START: init_mmu_early + enable_mmu_early
99+
START-->>CBOOT: 跳到 rtthread_startup()
100+
CBOOT->>CBOOT: rt_hw_board_init -> rt_hw_common_setup
101+
CBOOT-->>SCHED: rt_system_scheduler_start()
102+
SCHED-->>MAIN: 调度 main_thread_entry
103+
MAIN->>FW: rt_hw_secondary_cpu_up (CPU_ON)
104+
FW-->>SECASM: entry = _secondary_cpu_entry
105+
SECASM->>SECASM: 栈/TPIDR/EL 初始化
106+
SECASM-->>SECC: enable_mmu_early -> rt_hw_secondary_cpu_bsp_start
107+
SECC->>SECC: GIC/Timer/IPI 本地初始化
108+
SECC-->>SCHED: rt_system_scheduler_start()
109+
SCHED-->>MAIN: 继续 main()
110+
SCHED-->>其他线程: 多核调度
111+
```
112+
113+
## 关键代码位置对照表
114+
115+
| 阶段 | 主要文件 | 作用 |
116+
| --- | --- | --- |
117+
| 启动汇编 | `libcpu/aarch64/cortex-a/entry_point.S` | `_start``_secondary_cpu_entry`、MMU 早期开启 |
118+
| BSP 汇聚 | `bsp/qemu-virt64-aarch64/drivers/board.c` |`rt_hw_board_init()` 对接到 `rt_hw_common_setup()` |
119+
| 内存/GIC/IPI 初始化 | `libcpu/aarch64/common/setup.c` | `rt_hw_common_setup()``rt_hw_secondary_cpu_up()``rt_hw_secondary_cpu_bsp_start()` |
120+
| C 入口骨架 | `src/components.c` | `rtthread_startup()``main_thread_entry()` |
121+
122+
## 常见检查项(多核没起来时)
123+
124+
- 设备树是否有 `enable-method = "psci"`,且 QEMU 启动带了 `-machine virt`(自带 PSCI 固件)。
125+
- `_secondary_cpu_entry` 能否正确转成物理地址:`rt_kmem_v2p()` 返回 0 会触发断言。
126+
- GIC/Timer 是否在主核初始化完成后才去唤核;若自定义 BSP,务必在调用 `rt_hw_secondary_cpu_up()` 前完成中断与定时器初始化。
127+
- 观察串口日志中的 `Call cpu X on success/failed`,必要时在 `_secondary_cpu_entry` 里加额外打印,结合 `-d cpu_reset -smp N` 排查。
128+
129+
## AArch64 小抄(够用版)
130+
131+
- **异常级**:启动时可能在 EL3/EL2,`init_cpu_el` 会层层降到内核跑的 EL1h。
132+
- **双栈指针**`spsel #1` 选用 `SP_EL1`,保证内核栈不被 EL0 访问。
133+
- **MMU 开启顺序**:写页表 → 配置 TCR/TTBR → 刷 Cache/TLB → 置位 `SCTLR_EL1.M/C/I``isb` 生效。
134+
- **MPIDR**:多核唯一标识,`rt_cpu_mpidr_table[]` 保存 Boot CPU 和各次级核的 affinity,便于逻辑核编号和 IPI 目标匹配。
135+
136+
做到这里,QEMU virt64 AArch64 BSP 的多核启动主线基本就清楚了:Boot CPU 负责把内核和公共外设准备好,`main_thread_entry()` 发起 PSCI 唤核,次级核按同样的 MMU/EL 设置落地,再一起进入调度器。

0 commit comments

Comments
 (0)