Skip to content

feat(la64): use DMW kernel VA and place _start after image header#51

Open
eternalcomet wants to merge 1 commit intodevfrom
pr/la-dmw
Open

feat(la64): use DMW kernel VA and place _start after image header#51
eternalcomet wants to merge 1 commit intodevfrom
pr/la-dmw

Conversation

@eternalcomet
Copy link
Copy Markdown

This pr aims at passing ELF to qemu directly without stripping the ELF file.
Switch the LoongArch QEMU-virt default kernel virtual layout to a DMW-friendly 0x9000_... range and separate the Linux image header label from the real _start entry in boot code. This avoids direct -kernel ELF boot hangs seen with the previous high-half 0xffff_8000_... layout on current QEMU LoongArch boot path.

Switch the LoongArch QEMU-virt default kernel virtual layout to a DMW-friendly
0x9000_... range and separate the Linux image header label from the real `_start`
entry in boot code. This avoids direct `-kernel` ELF boot hangs seen with the
previous high-half `0xffff_8000_...` layout on current QEMU LoongArch boot path.
@eternalcomet
Copy link
Copy Markdown
Author

之前la64因为内核地址的问题在去年OS比赛时就出现过引导相关问题,我们启动的内核地址是0xffff8...,被qemu mask之后是0x00008...,这因为少mask掉一位,导致这个地址还不是物理地址,从而无法启动。最后是给QEMU做了一个patch,额外多patch了一位,但这个修改并没有进入qemu主线代码。为了彻底规避这个问题,使la64的QEMU可以直接加载未strip的ELF文件,我们将内核地址调整到经典的和DMW兼容的地址0x90000...,经测试可以正常运行。

@eternalcomet
Copy link
Copy Markdown
Author

eternalcomet commented Mar 28, 2026

LoongArch64 ELF boot: QEMU history, root cause, and ArceOS-side workaround

Background and historical discussion

From prior team discussion, there was an older LoongArch QEMU issue around virtual-to-physical truncation in direct boot paths:

  • Many LoongArch kernels use DMW-style kernel addresses (commonly 0x9...) instead of relying on full paging from the first instruction.
  • For high-half kernel addresses like 0xffff8..., using a simple low-48-bit mask can leave bit 47 set unexpectedly, producing invalid physical addresses in some direct-boot conversions.
  • Earlier local QEMU patches changed conversion behavior to 47-bit masking in selected places:
    • hw/loongarch/boot.c: cpu_loongarch_virt_to_phys()
    • target/loongarch/cpu_helper.c: DA/PG direct address path and valid-extension checks
  • ArceOS also had matching address-layout/boot-page-table adjustments at that time.

What was in the old patched QEMU tree

In previously local patched tree:

  • hw/loongarch/boot.c:191 uses:
return addr & MAKE_64BIT_MASK(0, TARGET_PHYS_ADDR_SPACE_BITS - 1);
  • target/loongarch/cpu_helper.c:276-277 uses 47-bit masking in DA mode.
  • target/loongarch/cpu_helper.c:303 uses:
addr_high = sextract64(address, TARGET_VIRT_ADDR_SPACE_BITS - 1, 17);

These match the historical workaround direction from prior discussion.

What is in current upstream QEMU (qemu-repo)

In qemu new upstream build, version 10.2.91:

  • hw/loongarch/boot.c:218-222 now does:
return addr & *phys_addr_mask;
  • phys_addr_mask comes from loongarch_palen_mask() (target/loongarch/cpu_helper.c:370), i.e. CPU PALEN-derived mask.
  • target/loongarch/cpu_helper.c:348-350 uses sign-extension validity check via shift from TARGET_VIRT_ADDR_SPACE_BITS - 1.

Conclusion: the exact old local workaround (TARGET_PHYS_ADDR_SPACE_BITS - 1 hard mask in those locations) is not present in this upstream tree as-is.

Current reproducible problem

With ArceOS high-half layout (0xffff_8000...) and -kernel <ELF> on LoongArch virt:

  • QEMU direct boot can hang or jump to invalid entry for non-Linux-image ELF workflows.
  • -kernel <bin> works more reliably because it bypasses ELF entry-translation semantics.

Observed behavior in this investigation:

  • -kernel ELF with old 0xffff_8000... layout hangs.
  • -kernel bin runs.

ArceOS-side workaround goal

Try to avoid QEMU-side changes and make ELF direct boot succeed by selecting a DMW-friendly kernel virtual layout.

ArceOS changes applied

External crate was overridden locally (as requested):

Effective config changes:

  • phys-virt-offset: 0x9000_0000_0000_0000
  • kernel-base-vaddr: 0x9000_0000_0020_0000
  • kernel-aspace-base: 0x9000_0000_0000_0000
  • kernel-aspace-size: 0x0000_ffff_ffff_f000

Boot assembly/layout change:

  • In local/axplat-loongarch64-qemu-virt/src/boot.rs, split Linux-image header label and actual _start label:
    • Keep Linux header at beginning (_linux_image_header)
    • Set real _start after header, so ELF entry points to executable code

Verification result

After local ArceOS override:

  • Built ELF entry becomes DMW-style, e.g. 0x9000000000200040.
  • Running upstream-built QEMU with direct ELF succeeds.

This confirms we can make ELF direct boot work without modifying QEMU, by using DMW-friendly ArceOS virtual addresses and proper _start placement after image header.

Interpretation

  • This issue is largely an interaction between:
    • QEMU LoongArch direct-boot ELF handling assumptions, and
    • ArceOS chosen kernel virtual layout.
  • Using 0xffff8... is possible architecturally with full paging design, but it is less friendly to current QEMU direct-boot behavior in this path.
  • Using 0x9000... DMW-style addresses is a practical compatibility strategy and aligns with common LoongArch early-boot practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant