Skip to content

Commit ba2976c

Browse files
committed
gem5: fix arm multicore with system.auto_reset_addr = True
baremetal: fix aarch64/no_bootloader/semihost_exit.S which was wrong because was using unset sp for register block. Tests needed urgently!!
1 parent 5b6a716 commit ba2976c

File tree

9 files changed

+180
-36
lines changed

9 files changed

+180
-36
lines changed

README.adoc

Lines changed: 101 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10560,9 +10560,14 @@ output:
1056010560
....
1056110561
./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 2
1056210562
./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 2 --gem5
10563+
./run --arch arm --baremetal arch/aarch64/multicore --cpus 2
10564+
./run --arch arm --baremetal arch/aarch64/multicore --cpus 2 --gem5
1056310565
....
1056410566

10565-
Source: link:baremetal/arch/aarch64/multicore.S[]
10567+
Sources:
10568+
10569+
* link:baremetal/arch/aarch64/multicore.S[]
10570+
* link:baremetal/arch/arm/multicore.S[]
1056610571

1056710572
CPU 0 of this program enters a spinlock loop: it repeatedly checks if a given memory address is `1`.
1056810573

@@ -10576,6 +10581,26 @@ Don't believe me? Then try:
1057610581

1057710582
and watch it hang forever.
1057810583

10584+
Note that if you try the same thing on gem5:
10585+
10586+
....
10587+
./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 1 --gem5
10588+
....
10589+
10590+
then the gem5 actually exits, but with a different message:
10591+
10592+
....
10593+
Exiting @ tick 18446744073709551615 because simulate() limit reached
10594+
....
10595+
10596+
as opposed to the expected:
10597+
10598+
....
10599+
Exiting @ tick 36500 because m5_exit instruction encountered
10600+
....
10601+
10602+
since gem5 is able to detect when nothing will ever happen, and exits.
10603+
1057910604
When GDB step debugging, switch between cores with the usual `thread` commands, see also: <<gdb-step-debug-multicore-userland>>.
1058010605

1058110606
Bibliography:
@@ -10594,6 +10619,81 @@ However, likely no implementation likely does (TODO confirm), since:
1059410619

1059510620
and power consumption is key in ARM applications.
1059610621

10622+
In QEMU 3.0.0, `SEV` is a NOPs, and `WFE` might be, but I'm not sure, see: https://github.com/qemu/qemu/blob/v3.0.0/target/arm/translate-a64.c#L1423
10623+
10624+
....
10625+
case 2: /* WFE */
10626+
if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
10627+
s->base.is_jmp = DISAS_WFE;
10628+
}
10629+
return;
10630+
case 4: /* SEV */
10631+
case 5: /* SEVL */
10632+
/* we treat all as NOP at least for now */
10633+
return;
10634+
....
10635+
10636+
TODO: what does the WFE code do? How can it not be a NOP if SEV is a NOP? https://github.com/qemu/qemu/blob/v3.0.0/target/arm/translate.c#L4609 might explain why, but it is Chinese to me (I only understand 30% ;-)):
10637+
10638+
....
10639+
* For WFI we will halt the vCPU until an IRQ. For WFE and YIELD we
10640+
* only call the helper when running single threaded TCG code to ensure
10641+
* the next round-robin scheduled vCPU gets a crack. In MTTCG mode we
10642+
* just skip this instruction. Currently the SEV/SEVL instructions
10643+
* which are *one* of many ways to wake the CPU from WFE are not
10644+
* implemented so we can't sleep like WFI does.
10645+
*/
10646+
....
10647+
10648+
For gem5 however, if we comment out the `SVE` instruction, then it actually exits with `simulate() limit reached`, so the CPU truly never wakes up, which is a more realistic behaviour.
10649+
10650+
The following Raspberry Pi bibliography helped us get this sample up and running:
10651+
10652+
* https://github.com/bztsrc/raspi3-tutorial/tree/a3f069b794aeebef633dbe1af3610784d55a0efa/02_multicorec
10653+
* https://github.com/dwelch67/raspberrypi/tree/a09771a1d5a0b53d8e7a461948dc226c5467aeec/multi00
10654+
* https://github.com/LdB-ECM/Raspberry-Pi/blob/3b628a2c113b3997ffdb408db03093b2953e4961/Multicore/SmartStart64.S
10655+
* https://github.com/LdB-ECM/Raspberry-Pi/blob/3b628a2c113b3997ffdb408db03093b2953e4961/Multicore/SmartStart32.S
10656+
10657+
===== PSCI
10658+
10659+
In QEMU, CPU 1 starts in a halted state. This can be observed from GDB, where:
10660+
10661+
....
10662+
info threads
10663+
....
10664+
10665+
shows something like:
10666+
10667+
....
10668+
* 1 Thread 1 (CPU#0 [running]) mystart
10669+
2 Thread 2 (CPU#1 [halted ]) mystart
10670+
....
10671+
10672+
To wake up CPU 1 on QEMU, we must use the Power State Coordination Interface (PSCI) which is documented at: link:https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document[].
10673+
10674+
This interface uses `HVC` calls, and the calling convention is documented at "SMC CALLING CONVENTION" link:https://developer.arm.com/docs/den0028/latest[].
10675+
10676+
If we boot the Linux kernel on QEMU and <<get-device-tree-from-a-running-kernel,dump the auto-generated device tree>>, we observe that it contains the address of the PSCI CPU_ON call:
10677+
10678+
....
10679+
psci {
10680+
method = "hvc";
10681+
compatible = "arm,psci-0.2", "arm,psci";
10682+
cpu_on = <0xc4000003>;
10683+
migrate = <0xc4000005>;
10684+
cpu_suspend = <0xc4000001>;
10685+
cpu_off = <0x84000002>;
10686+
};
10687+
....
10688+
10689+
The Linux kernel wakes up the secondary cores in this exact same way at: https://github.com/torvalds/linux/blob/v4.19/drivers/firmware/psci.c#L122 We first actually got it working here by grepping the kernel and step debugging that call :-)
10690+
10691+
In gem5, CPU 1 starts woken up from the start, so PSCI is not needed. TODO gem5 actually blows up if we try to do the `hvc` call, understand why.
10692+
10693+
===== DMB
10694+
10695+
TODO: create and study a minimal examples in gem5 where the `DMB` instruction leads to less cycles: https://stackoverflow.com/questions/15491751/real-life-use-cases-of-barriers-dsb-dmb-isb-in-arm
10696+
1059710697
=== How we got some baremetal stuff to work
1059810698

1059910699
It is nice when thing just work.

baremetal/arch/aarch64/multicore.S

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,12 @@ main:
77
ldr x1, =spinlock
88
str x0, [x1]
99

10-
/* Read cpu id into x1. */
10+
/* Read cpu id into x1.
11+
* TODO: cores beyond 4th?
12+
*/
1113
mrs x1, mpidr_el1
12-
and x1, x1, 3
13-
cbz x1, cpu0_only
14+
ands x1, x1, 3
15+
beq cpu0_only
1416
cpu1_only:
1517
/* Only CPU 1 reaches this point and sets the spinlock. */
1618
mov x0, 1
@@ -35,8 +37,7 @@ cpu0_only:
3537

3638
#if !defined(GEM5)
3739
/* Wake up CPU 1 from initial sleep!
38-
* In gem5, CPU 1 starts woken up from the start,
39-
* so this is not needed.
40+
* See:https://github.com/cirosantilli/linux-kernel-module-cheat#psci
4041
*/
4142
/* Function identifier: PCSI CPU_ON. */
4243
ldr w0, =0xc4000003

baremetal/arch/aarch64/no_bootloader/semihost_exit.S

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,14 @@
22

33
.global mystart
44
mystart:
5-
mov x1, #0x26
6-
movk x1, #2, lsl #16
7-
str x1, [sp,#0]
5+
mov x1, 0x26
6+
movk x1, 2, lsl 16
7+
ldr x2, =semihost_args
8+
str x1, [x2, 0]
89
mov x0, #0
9-
str x0, [sp,#8]
10-
mov x1, sp
11-
mov w0, #0x18
10+
str x0, [x2, 8]
11+
mov x1, x2
12+
mov w0, 0x18
1213
hlt 0xf000
14+
semihost_args:
15+
.skip 16
Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
11
.global main
22
main:
33
/* 0x20026 == ADP_Stopped_ApplicationExit */
4-
mov x1, #0x26
5-
movk x1, #2, lsl #16
6-
str x1, [sp,#0]
4+
mov x1, 0x26
5+
movk x1, 2, lsl 16
6+
str x1, [sp, 0]
77

88
/* Exit status code. Host QEMU process exits with that status. */
9-
mov x0, #0
10-
str x0, [sp,#8]
9+
mov x0, 0
10+
str x0, [sp, 8]
1111

1212
/* x1 contains the address of parameter block.
13-
* Any memory address could be used. */
13+
* Any memory address could be used.
14+
*/
1415
mov x1, sp
1516

1617
/* SYS_EXIT */
17-
mov w0, #0x18
18+
mov w0, 0x18
1819

1920
/* Do the semihosting call on A64. */
2021
hlt 0xf000

baremetal/arch/arm/multicore.S

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-multicore */
2+
3+
.global main
4+
main:
5+
mov r0, #0
6+
ldr r1, =spinlock
7+
str r0, [r1]
8+
/* Get CPU ID. */
9+
mrc p15, 0, r1, c0, c0, 5
10+
ands r1, r1, #3
11+
beq cpu0_only
12+
cpu1_only:
13+
mov r0, #1
14+
ldr r1, =spinlock
15+
str r0, [r1]
16+
dmb sy
17+
sev
18+
cpu1_sleep_forever:
19+
wfe
20+
b cpu1_sleep_forever
21+
cpu0_only:
22+
#if !defined(GEM5)
23+
/* PCSI CPU_ON. */
24+
ldr r0, =0x84000003
25+
mov r1, #1
26+
ldr r2, =cpu1_only
27+
mov r3, #0
28+
hvc 0
29+
#endif
30+
spinlock_start:
31+
ldr r0, spinlock
32+
wfe
33+
cmp r0, #0
34+
beq spinlock_start
35+
bx lr
36+
spinlock:
37+
.skip 4

common.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -931,7 +931,7 @@ def setup(parser):
931931
common.qcow2_file = common.buildroot_qcow2_file
932932

933933
# Image.
934-
if args.baremetal is None:
934+
if common.baremetal is None:
935935
if common.emulator == 'gem5':
936936
common.image = common.vmlinux
937937
common.disk_image = common.rootfs_raw_file
@@ -940,11 +940,11 @@ def setup(parser):
940940
common.disk_image = common.qcow2_file
941941
else:
942942
common.disk_image = common.gem5_fake_iso
943-
if args.baremetal == 'all':
944-
path = args.baremetal
943+
if common.baremetal == 'all':
944+
path = common.baremetal
945945
else:
946946
path = common.resolve_executable(
947-
args.baremetal,
947+
common.baremetal,
948948
common.baremetal_src_dir,
949949
common.baremetal_build_dir,
950950
common.baremetal_build_ext,

run

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ def main(args, extra_args=None):
128128
raise Exception('Baremetal ELF file not found. Tried:\n' + '\n'.join(paths))
129129
cmd = debug_vm.copy()
130130
if common.emulator == 'gem5':
131-
if args.baremetal is None:
131+
if common.baremetal is None:
132132
if not os.path.exists(common.rootfs_raw_file):
133133
if not os.path.exists(common.qcow2_file):
134134
raise_rootfs_not_found()
@@ -139,7 +139,7 @@ def main(args, extra_args=None):
139139
common.write_string_to_file(common.gem5_fake_iso, 'a' * 512)
140140
if not os.path.exists(common.image):
141141
# This is to run gem5 from a prebuilt download.
142-
if (not args.baremetal is None) or (not os.path.exists(common.linux_image)):
142+
if (not common.baremetal is None) or (not os.path.exists(common.linux_image)):
143143
raise_image_not_found()
144144
common.run_cmd([os.path.join(common.extract_vmlinux, common.linux_image)])
145145
os.makedirs(os.path.dirname(common.gem5_readfile), exist_ok=True)
@@ -194,15 +194,17 @@ def main(args, extra_args=None):
194194
'--dtb-filename', os.path.join(common.gem5_system_dir, 'arm', 'dt', 'armv{}_gem5_v1_{}cpu.dtb'.format(common.armv, args.cpus)), common.Newline,
195195
'--machine-type', common.machine, common.Newline,
196196
])
197-
if args.baremetal is None:
197+
if common.baremetal is None:
198198
cmd.extend([
199199
'--param', 'system.panic_on_panic = True', common.Newline])
200200
else:
201-
cmd.extend(['--bare-metal', common.Newline])
201+
cmd.extend([
202+
'--bare-metal', common.Newline,
203+
'--param', 'system.auto_reset_addr = True', common.Newline,
204+
])
202205
if args.arch == 'aarch64':
203206
# https://stackoverflow.com/questions/43682311/uart-communication-in-gem5-with-arm-bare-metal/50983650#50983650
204207
cmd.extend(['--param', 'system.highest_el_is_64 = True', common.Newline])
205-
cmd.extend(['--param', 'system.auto_reset_addr = True', common.Newline])
206208
elif args.gem5_script == 'biglittle':
207209
if args.kvm:
208210
cpu_type = 'kvm'
@@ -319,7 +321,7 @@ def main(args, extra_args=None):
319321
root = 'root=/dev/vda'
320322
rrid = ''
321323
snapshot = ',snapshot'
322-
if args.baremetal is None:
324+
if common.baremetal is None:
323325
if not os.path.exists(common.qcow2_file):
324326
if not os.path.exists(common.rootfs_raw_file):
325327
raise_rootfs_not_found()
@@ -364,7 +366,7 @@ def main(args, extra_args=None):
364366
] +
365367
virtio_gpu_pci
366368
)
367-
if args.baremetal is None:
369+
if common.baremetal is None:
368370
cmd.extend(append)
369371
if args.tmux is not None:
370372
tmux_args = '--run-id {}'.format(args.run_id)
@@ -381,8 +383,8 @@ def main(args, extra_args=None):
381383
args.linux_build_id,
382384
args.run_id,
383385
)
384-
if args.baremetal:
385-
tmux_args += " --baremetal '{}'".format(args.baremetal)
386+
if common.baremetal:
387+
tmux_args += " --baremetal '{}'".format(common.baremetal)
386388
if args.userland:
387389
tmux_args += " --userland '{}'".format(args.userland)
388390
tmux_args += ' {}'.format(args.tmux)

run-gdb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,15 +120,15 @@ def main(args, extra_args=None):
120120
break_at = ['-ex', 'break {}'.format(args.break_at), common.Newline]
121121
else:
122122
break_at = []
123-
linux_full_system = (args.baremetal is None and args.userland is None)
123+
linux_full_system = (common.baremetal is None and args.userland is None)
124124
if args.userland:
125125
image = common.resolve_userland(args.userland)
126-
elif args.baremetal:
126+
elif common.baremetal:
127127
image = common.image
128128
test_script_path = os.path.splitext(common.source_path)[0] + '.py'
129129
else:
130130
image = common.vmlinux
131-
if args.baremetal:
131+
if common.baremetal:
132132
allowed_toolchains = ['crosstool-ng', 'buildroot', 'host']
133133
else:
134134
allowed_toolchains = ['buildroot', 'crosstool-ng', 'host']

run-toolchain

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ parser.add_argument(
3535
nargs='*'
3636
)
3737
args = common.setup(parser)
38-
if args.baremetal is None:
38+
if common.baremetal is None:
3939
image = common.vmlinux
4040
else:
4141
image = common.image

0 commit comments

Comments
 (0)