Skip to content

Commit ecd7f5e

Browse files
authored
Memory management overhaul (#50)
* Standardize the capitalization of Devicetree. Signed-off-by: Amy Ringo <me@remexre.com> * Move zalloc to the header, so the optimizer can see it. Signed-off-by: Amy Ringo <me@remexre.com> * Adds logging to generate_bootstub. Signed-off-by: Amy Ringo <me@remexre.com> * Adds documentation for the new memory-management code. Signed-off-by: Amy Ringo <me@remexre.com> * Update generate_bootstub for memory management changes. Signed-off-by: Amy Ringo <me@remexre.com> * Enables stack checking. Signed-off-by: Amy Ringo <me@remexre.com> * Move the stdckdint.h functions into compiler.h. Signed-off-by: Amy Ringo <me@remexre.com> * Use uaddr as the bits type of paddr. Signed-off-by: Amy Ringo <me@remexre.com> * Adds bounds checking to paddr_offset. Signed-off-by: Amy Ringo <me@remexre.com> * Reimplement physical memory access with the new physical memory APIs. Signed-off-by: Amy Ringo <me@remexre.com> * Adds a bit of entropy early in boot. Right now, just the cycle time and timestamp for boot. These are pretty attacker-guessable, but better than nothing. This will be used to generate XOR cookies to initialize the heap with. We usually get better entropy from U-Boot once we process the Devicetree, so the RNG will reseed at that point. Signed-off-by: Amy Ringo <me@remexre.com> * Adds a revamped Devicetree parser. This heap-allocates the Devicetree, so we don't need to keep the FDT structure reserved. It also includes the memory reservations block in the Devicetree proper, under the /reserved-memory node. This still needs to be fully integrated; e.g. by: - Adding RAM to the allocators. - Adding /chosen/rng-seed to the entropy pool. - Creating devices from nodes. This has been tested and works on the Milk-V Duo S. Signed-off-by: Amy Ringo <me@remexre.com> * Adds entropy from the Devicetree to the entropy pool. It gets removed from the Devicetree to reduce the chances that an attacker could discover it later. This is maybe kinda moot though, because of how resistant the entropy pool is to attacks based on that. Signed-off-by: Amy Ringo <me@remexre.com> * Implements the new, drastically simpler, physical allocator. Signed-off-by: Amy Ringo <me@remexre.com> * Marks the kernel as reserved in the Devicetree. Signed-off-by: Amy Ringo <me@remexre.com> * Adds utility methods for dealing with the reg prop. Signed-off-by: Amy Ringo <me@remexre.com> * Finishes initialization of the new physical memory allocator. Tested in QEMU and the Milk-V Duo S. Signed-off-by: Amy Ringo <me@remexre.com> * Implements mm_free_physical. Signed-off-by: Amy Ringo <me@remexre.com> * Adds random functions for generating u32 and u64. These are for use in a treap implementation; arguably there should be a "fast but lower-quality" RNG for this kind of usecase, but we really don't want people e.g. using it when it would introduce a DoS attack. Signed-off-by: Amy Ringo <me@remexre.com> * Data structure definitions for the new virtual allocator. Signed-off-by: Amy Ringo <me@remexre.com> * Adds vma_alloc_by_addr, uses it to reserve the initial mappings. Signed-off-by: Amy Ringo <me@remexre.com> * Moves out treap rotation to a helper. Signed-off-by: Amy Ringo <me@remexre.com> * Expose vma_bounds. The page fault handler, among other things, will need it. Signed-off-by: Amy Ringo <me@remexre.com> * Fixes a bug in treap rotation, finishes VMA allocation. Signed-off-by: Amy Ringo <me@remexre.com> * More debug printing in main. Signed-off-by: Amy Ringo <me@remexre.com> * Implements vma_free. Signed-off-by: Amy Ringo <me@remexre.com> * Removes the boothart_heap_segment that wasn't allocated in the bootstub. Signed-off-by: Amy Ringo <me@remexre.com> * Fix some allocator problems. - More consistently XORs linked lists, though this still needs to be done to the remote list. - Properly maintains pages_direct. Signed-off-by: Amy Ringo <me@remexre.com> * Fix more silly oversights. Signed-off-by: Amy Ringo <me@remexre.com> * Makes the kernel virtual allocator a global. Signed-off-by: Amy Ringo <me@remexre.com> * Wire up small object allocation. Signed-off-by: Amy Ringo <me@remexre.com> * Adds is_aligned. Signed-off-by: Amy Ringo <me@remexre.com> * Uses is_aligned() where applicable. Signed-off-by: Amy Ringo <me@remexre.com> * Adds vma_alloc_aligned. Signed-off-by: Amy Ringo <me@remexre.com> * Adds bzero_physical. Signed-off-by: Amy Ringo <me@remexre.com> * Adds support for mapping pages. This currently only supports 4KiB pages with Sv39. Signed-off-by: Amy Ringo <me@remexre.com> * Reverses the order mm_alloc_physical hands out pages in. This results in more readable output from the `info mem` command in QEMU, since allocations are more often contiguous. Signed-off-by: Amy Ringo <me@remexre.com> * Adds the ARRAY_SIZE macro. This is like Linux's, but uses static_assert to give a better compile error when the given field is not an array. Signed-off-by: Amy Ringo <me@remexre.com> * Adds support for pointer types to is_aligned. Signed-off-by: Amy Ringo <me@remexre.com> * Fixes a crash on OOM. Signed-off-by: Amy Ringo <me@remexre.com> * Fixes vma_alloc sometimes not splitting a VMA it should. Signed-off-by: Amy Ringo <me@remexre.com> * Splits up the allocator, improving consistency and fixing bugs. At this point, allocation works but freeing is a no-op. Signed-off-by: Amy Ringo <me@remexre.com> * Adds mm_paging_unmap. Signed-off-by: Amy Ringo <me@remexre.com> * Makes vma_find public. Signed-off-by: Amy Ringo <me@remexre.com> * Implements freeing. Signed-off-by: Amy Ringo <me@remexre.com> * Implements realloc. Signed-off-by: Amy Ringo <me@remexre.com> * Makes the doc-serve Makefile target serve on all IPs. Signed-off-by: Amy Ringo <me@remexre.com> * Adds a simple note on the implementation of each allocator. Signed-off-by: Amy Ringo <me@remexre.com> * Fixes allocator heap corruption bug and assertion failure. - page_collect shouldn't assume that the page has an empty free list, but neither should it do O(n) work when it does not. - After page_free, alloc_generic should not refer to the page anymore. Signed-off-by: Amy Ringo <me@remexre.com> * Add bootstub_generated.ld to gitignore and defcleanable. Signed-off-by: Amy Ringo <me@remexre.com> * Adds comments about zero-size allocs. Signed-off-by: Amy Ringo <me@remexre.com> * Adds log2_size_of_size_class helper to get rid of a magic number. Signed-off-by: Amy Ringo <me@remexre.com> * Adds a constant for the sentinel size class value used by huge pages. Signed-off-by: Amy Ringo <me@remexre.com> * Use decrement as a statement, not an expression. Signed-off-by: Amy Ringo <me@remexre.com> * Get rid of a magic number. Signed-off-by: Amy Ringo <me@remexre.com> * Clarify that the VMA allocator stores the VMAs in multiple different data structures. Signed-off-by: Amy Ringo <me@remexre.com> * Use assignment as a statement, not an expression. Signed-off-by: Amy Ringo <me@remexre.com> * Adds some comments to explain clearing needs_delayed_free. Signed-off-by: Amy Ringo <me@remexre.com> * Improves the comment about the treap priority being non-zero. Signed-off-by: Amy Ringo <me@remexre.com> * Adds mailmap. Signed-off-by: Amy Ringo <me@remexre.com> * Gets rid of some magic numbers. Signed-off-by: Amy Ringo <me@remexre.com> --------- Signed-off-by: Amy Ringo <me@remexre.com>
1 parent f7239a7 commit ecd7f5e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+3764
-1785
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ result-*
1414

1515
/build/
1616
/compile_commands.json
17+
/src/kernel/arch/riscv64/bootstub_generated.ld
1718
/src/kernel/arch/riscv64/bootstub_generated.S
1819
/src/kernel/arch/riscv64/kernel-unstripped.elf
1920
/src/kernel/arch/riscv64/kernel.elf

.mailmap

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Amy Ringo <me@remexre.com>

REUSE.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ SPDX-FileCopyrightText = "2025 ukoOS Contributors"
1313
SPDX-License-Identifier = "CC-BY-SA-4.0 OR GFDL-1.3-or-later"
1414

1515
[[annotations]]
16-
path = ["flake.lock"]
16+
path = [".mailmap", "flake.lock"]
1717
precedence = "aggregate"
1818
SPDX-FileCopyrightText = "2025 ukoOS Contributors"
1919
SPDX-License-Identifier = "CC0-1.0"

doc/SUMMARY.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@
1010
- [Kernel]()
1111
- [The `print()` and `format()` functions](./kernel/print.md)
1212
- [Threads and Harts](./kernel/threads-and-harts.md)
13+
- [Memory management]()
14+
- [Overview](./kernel/mm/overview.md)
15+
- [Memory map](./kernel/mm/memory-map.md)
16+
- [Booting](./kernel/mm/booting.md)
1317
- [Targets](./targets.md)
1418
- [Tutorials](./tutorials/tutorials.md)
1519
- [First Day](./tutorials/first-day.md)

doc/include.mak

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ all:: doc/book
1414
clean::
1515
@if [[ -d doc/book ]]; then echo "CLEAN doc/book"; rm -r doc/book; fi
1616
doc-serve:
17-
$(Q)mdbook serve --dest-dir $(abspath doc/book) $(srcdir)/doc
17+
$(Q)mdbook serve --dest-dir "$$(realpath doc/book)" --hostname :: $(srcdir)/doc
1818
.PHONY: doc-serve
1919

2020
doc/book: $(srcdir)/doc/book.toml $(srcdir)/doc/SUMMARY.md

doc/kernel/mm/booting.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Booting
2+
3+
Before ordinary driver code can run, the three main allocators in the mm subsystem (heap, physical, virtual) need to be initialized.
4+
This document describes that initialization.
5+
6+
## Build-time
7+
8+
At build-time, we can request pages (as `.bss` or `.data`) that we're given before the kernel starts, saving us from having to implement allocation that early in boot.
9+
We use this to allocate initial page tables and some initial heap and stack memory.
10+
11+
See `src/kernel/arch/riscv64/generate_bootstub.py` for the code that does this.
12+
13+
The initial page tables are generated ahead-of-time and compiled into the kernel binary, so that virtual memory works immediately, before any allocators are up.
14+
These page tables have entries for:
15+
16+
- The physical memory mapping.
17+
- The entire mapping is made, using 1GiB pages.
18+
- The initial heap segment.
19+
- This is 4MiB, and gets mapped to the RAM area.
20+
- The boothart's stack.
21+
- This is 2MiB, and gets mapped to the RAM area.
22+
- After the physical memory allocator is set up, guard pages get set up.
23+
- The kernel.
24+
25+
## Heap allocator
26+
27+
There is one instance of the heap allocator per hart, to avoid the need to acquire locks or use atomics in the fast-path of allocation.
28+
The boothart's heap allocator gets initialized with the initial heap segment allocated at build-time.
29+
This lets the boothart allocate up to 4MiB of objects whose sizes are less than 512KiB.
30+
31+
One annoying thing -- the heap allocator depends on a source of entropy.
32+
This early in boot, the entropy pool cannot be fully seeded, so we have to try to harvest a bit of entropy to use.
33+
Right now, our only source of entropy at this point is the cycle and time counters.
34+
We take a trap while moving to the higher half, so we can get some unpredictability from the timings there; when booting on real hardware, the time taken to load the kernel from storage should also provide some.
35+
36+
The Devicetree gets parsed into memory owned by the heap allocator, which lets us add memory reservations for e.g. the kernel itself.
37+
38+
## Physical allocator
39+
40+
Once the heap allocator is initialized on the boothart, we can discover the rest of the RAM.
41+
We do this by parsing the Devicetree that was passed to us by the bootloader.
42+
43+
Once it's parsed, we can easily extract the parts of it we need:
44+
45+
- The `/chosen/rng-seed` node, as entropy to further initialize the entropy pool.
46+
This is usually enough to fully initialize the pool.
47+
- `/memory` nodes, which describe the memory installed on the device.
48+
- `/reserved-memory` nodes, which we avoid adding to the physical allocator.
49+
50+
From the memory and reservations, we can find all the free regions of unreserved RAM.
51+
We use a simple free list to track them.
52+
53+
## Virtual allocator
54+
55+
The virtual allocator for higher-half memory can now be initialized as well.
56+
This allocator covers the entire 38-bit space, but will only ever have the RAM area marked as free.
57+
58+
Once this is done, the heap allocator is able to allocate more heap segments from the physical allocator, so there's no longer a 4MiB limitation on heap allocation.

doc/kernel/mm/memory-map.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Memory map
2+
3+
ukoOS is a higher-half kernel (i.e., all the kernel's data is mapped to an address whose MSB is 1).
4+
Depending on how many bits of virtual address space the hardware supports, the kernel memory map is somewhat different.
5+
6+
## Sv39 memory map
7+
8+
In Sv39, virtual addresses are 39 bits, and sign-extended to 64 bits.
9+
10+
| Start Address | End Address | Size | Description |
11+
|:---------------------|:---------------------|:---------------|:--------------------------|
12+
| `0x0000000000000000` | `0x0000003fffffffff` | 256GiB | Userspace virtual memory |
13+
| `0x0000004000000000` | `0xffffffbfffffffff` | 16EiB - 512GiB | Illegal addresses in Sv39 |
14+
| `0xffffffc000000000` | `0xffffffdfffffffff` | 128GiB | Physical memory |
15+
| `0xffffffe000000000` | `0xffffffffbfffffff` | 127GiB | RAM |
16+
| `0xffffffffc0000000` | `0xffffffffffffffff` | 1GiB | Kernel |
17+
18+
- The userspace memory map can be controlled from userspace, and does not have a fixed structure.
19+
20+
- A large range of 64-bit addresses are illegal in Sv39, because there are not enough bits to represent them.
21+
22+
- 128GiB of physical memory is directly mapped.
23+
This should be enough to access any memory-mapped devices; devices that use more physical memory than this tend to support Sv48 or Sv57.
24+
25+
- Up to 127GiB of RAM can be mapped.
26+
Past this point, no more memory can be used by the kernel.
27+
Machines with anywhere near this much memory support Sv48 or Sv57, so this isn't a limitation in practice.
28+
29+
Memory gets mapped here by the allocator as needed.
30+
31+
- The kernel itself is mapped into a large contiguous region.
32+
There are a lot of smaller regions within this region, but they're outside the scope of this page.

doc/kernel/mm/overview.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Overview
2+
3+
ukoOS has multiple strategies for memory management, that manage memory at different levels.
4+
5+
- The kernel has a standard memory allocator, accessed with the functions `alloc` and `free`.
6+
These functions act similarly to `malloc` and `free` in ordinary C.
7+
8+
This is based on the design from [Mimalloc: Free List Sharding in Action](https://www.microsoft.com/en-us/research/wp-content/uploads/2019/06/mimalloc-tr-v1.pdf); read that if you want to understand the design.
9+
10+
The allocator that handles these requests is called **the heap memory allocator**.
11+
12+
- The kernel keeps track of all of RAM, and hands out pages to be mapped into userspace processes and to be used by the heap memory allocator.
13+
14+
This allocator is a simple free list.
15+
16+
This allocator is currently not capable of allocating more than a single contiguous page, but could be extended to support this in the future.
17+
This allocator is called **the physical memory allocator**.
18+
19+
- The kernel manages its own virtual memory, in the RAM region of the memory map.
20+
21+
This allocator is a pair of treaps, one for all VMAs sorted by address, and another for only free VMAs sorted by size.
22+
If you're not already familiar with treaps, there's a Julia Evans piece about them: [Data structure: the treap!](https://jvns.ca/blog/2017/09/09/data-structure--the-treap-/)
23+
24+
The allocator that handles these requests is called **the virtual memory allocator**.
25+
26+
Each hart has its own root page table, since it can be running a different userspace process.
27+
However, the kernel's memory map is kept in sync between all harts.
28+
29+
Higher-half memory is only rarely mapped and unmapped, so relatively inefficient mechanisms (a full TLB shootdown) can be used to ensure all harts have the same view of it.

src/kernel/arch/riscv64/bootstub.S

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,13 @@
1111
# Expects the following arguments:
1212
#
1313
# a0: The hart ID.
14-
# a1: The DeviceTree pointer.
14+
# a1: The Devicetree pointer.
1515
.p2align 2
1616
.global _start
1717
.type _start, @function
1818
_start:
1919
# Set up the initial stack.
20-
la sp, 0xffffffffffffd000 - 16
21-
la t0, _stack_end_phys - 16
20+
ld sp, (initial_stack_va)
2221

2322
# Get the (physical) address of the root page table.
2423
la t1, boot_pgtbl_2_0000000000000000
@@ -38,10 +37,6 @@ _start:
3837
ld a5, (symtab_len)
3938
ld a6, (strtab_va)
4039
ld a7, (strtab_len)
41-
ld t3, (free_va_start)
42-
ld t4, (free_va_end)
43-
sd t3, (t0)
44-
sd t4, 8(t0)
4540

4641
# Get the (virtual) address of the higher-half entrypoint.
4742
ld t0, (entrypoint)

src/kernel/arch/riscv64/bootstub.ld

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -36,19 +36,12 @@ SECTIONS
3636
*/
3737
_kernel_start_phys = .;
3838

39-
/**
40-
* The page tables get appended as well.
41-
*/
42-
.pgtbls ALIGN(0x1000) :
43-
{
44-
*(.pgtbls)
45-
}
46-
4739
/**
4840
* The kernel's string and symbol table sections get appended.
4941
*/
5042
.kernel_tables ALIGN(0x1000) :
5143
{
44+
PROVIDE(__start_kernel_tables = .);
5245
*(.kernel_tables)
5346
}
5447

@@ -57,6 +50,7 @@ SECTIONS
5750
*/
5851
.kernel ALIGN(0x1000) :
5952
{
53+
PROVIDE(__start_kernel = .);
6054
*(.kernel)
6155
}
6256
.kernel_bss :
@@ -65,13 +59,28 @@ SECTIONS
6559
}
6660

6761
/**
68-
* An initial stack page, to use in early boot (i.e., before the allocator is up).
62+
* The initial heap segment and boothart stack.
6963
*/
70-
.stack ALIGN(0x1000) :
64+
.initial_heap_segment ALIGN(0x1000) :
65+
{
66+
PROVIDE(__start_initial_heap_segment = .);
67+
. += 0x400000;
68+
}
69+
.boothart_stack ALIGN(0x1000) :
7170
{
72-
. += 0x1000;
73-
_stack_end_phys = .;
71+
PROVIDE(__start_boothart_stack = .);
72+
. += 0x200000;
73+
}
74+
75+
/**
76+
* The page tables get appended as well.
77+
*/
78+
.pgtbls ALIGN(0x1000) :
79+
{
80+
*(.pgtbls)
7481
}
7582

7683
_kernel_end_phys = .;
84+
85+
INCLUDE "src/kernel/arch/riscv64/bootstub_generated.ld"
7786
}

0 commit comments

Comments
 (0)