Skip to content

Commit 47cb475

Browse files
authored
feat: support weak map (#13)
1 parent 2ea7efc commit 47cb475

File tree

13 files changed

+623
-121
lines changed

13 files changed

+623
-121
lines changed

notes/2026-02-26.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# WeakMap Support
2+
3+
Note author: shruti2522
4+
5+
This document summarizes all the changes made to add weak map support to the mark sweep collector.
6+
7+
This adds `WeakMap<K, V>` to the mark sweep collector backed by ephemerons.
8+
Each entry's key is held weakly and when the key is collected, the entry is pruned from the map automatically.
9+
10+
## changes made in PR #13
11+
12+
### `WeakGcBox` refactor
13+
14+
Previously `WeakGcBox<T>` allocated its own `GcBox<T>` and used an `IS_WEAK`
15+
bit in the header flags to mark it as weak. this approach made sense as a first pass but a weak reference to an existing `Gc<K>` should point at the same allocation, not create a second one.
16+
The GC would never see the key as dead if `WeakGcBox` held its own copy.
17+
18+
So I refactored `WeakGcBox` to hold an `ErasedArenaPointer` into an existing
19+
allocation, also removed `IS_WEAK` flag, `weak_white`/`weak_black` header constructors
20+
and the related `GcBox` code from `gc_header.rs` and `gc_box.rs`
21+
22+
### `Ephemeron::new_in` now uses `WeakGcBox<K>`
23+
24+
`Ephemeron::new_in` used to accept a value `K` and create a `WeakGcBox`
25+
internally. Now it takes a `WeakGcBox<K>` directly. The caller creates the weak
26+
reference from an existing `Gc<K>`. This keeps things organized by
27+
separating how we create the weak link from how the ephemeron uses it
28+
29+
### new cleanup functions added to `EphemeronVTable`
30+
31+
Added `is_reachable_fn` and `finalize_fn` to `EphemeronVTable`. This
32+
lets the collector's sweep phase check if a key is still alive and run any
33+
finalizers without needing to know the specific types involved.
34+
35+
### `WeakMap<K, V>`
36+
37+
It's a `HashMap<usize, ArenaPointer<Ephemeron<K, V>>>`,
38+
keyed by the raw pointer address of `Gc<K>`. This gives O(1) average time
39+
for insert, lookup and remove
40+
41+
- `insert` removes any existing entry for the key before allocating a new ephemeron.
42+
This prevents the old one from leaking into the collector queue when a value is
43+
updated
44+
45+
- `remove` takes the entry out of the map but the backing ephemeron stays in the
46+
collector queue, it gets swept when the key is collected.
47+
48+
`prune_dead_entries` only visits entries still in `self.entries` so there is no
49+
risk of reading freed memory.
50+
51+
- `Trace` is handled, the ephemerons are in the collector's own queue so `WeakMap` itself doesn't need to do anything extra
52+
53+
## how the collector owns the map
54+
55+
`WeakMap::new(collector)` boxes the `WeakMapInner<K, V>`, grabs a raw pointer
56+
from the box before erasing it to `Box<dyn ErasedWeakMap>`, then pushes that
57+
erased box into `collector.weak_maps`. The `WeakMap` handle just holds that raw
58+
pointer and is valid for as long as the collector lives
59+
60+
`ErasedWeakMap` is an internal helper with one method: `prune_dead_entries`.
61+
during `collect()` the collector calls it on every map in `weak_maps` after the
62+
sweep phase but before the dead arenas are freed. this ensures we can still read
63+
the dropped flag on ephemerons before their memory is gone.
64+
65+
In my fisrt attempt, I made users manually register their maps with the collector, this
66+
was awkward because it required `unsafe` code. It was also hard for the collector to safely
67+
keep track of maps that lived outside its own memory. Worse, if a user forgot to
68+
unregister a map before deleting it, the collector was left holding a bad pointer
69+
which could cause the whole program to crash
70+
71+
By allocating `WeakMapInner` on the collector's heap and giving the collector ownership,
72+
we eliminated the manual registration step completely. the user gets a handle that
73+
is valid for the collector's lifetime, the aliasing
74+
concerns are resolved because the collector owns the memory it prunes
75+
76+
## potential improvements
77+
78+
**`weak_map.rs`**
79+
80+
- explore using `HashTable` instead of `HashMap` to save memory.
81+
- consider whether `insert` should take an `Ephemeron` directly instead of a key/value pair.
82+
83+
## conclusion
84+
85+
This approach gives us a map that cleans itself up automatically. It lives exactly
86+
as long as the collector does, users don't have to fiddle with manual registration
87+
or write `unsafe` code, and it plugs right into the collector's existing trace
88+
and sweep phases.
89+
90+
In the future, the best improvement would be switching from `HashMap` to `HashTable`
91+
to save memory. Until then, this first version works well and gives us the weak map behavior we need

oscars/src/alloc/arena2/alloc.rs

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,19 @@ impl<'arena> Arena<'arena> {
353353
}
354354
true
355355
}
356+
357+
// checks dropped items in this arena
358+
#[cfg(test)]
359+
pub fn item_drop_states(&self) -> rust_alloc::vec::Vec<bool> {
360+
let mut result = rust_alloc::vec::Vec::new();
361+
let mut unchecked_ptr = self.last_allocation.get();
362+
while let Some(node) = NonNull::new(unchecked_ptr) {
363+
let item = unsafe { node.as_ref() };
364+
result.push(item.is_dropped());
365+
unchecked_ptr = item.next.as_ptr() as *mut ErasedHeapItem
366+
}
367+
result
368+
}
356369
}
357370

358371
impl<'arena> Drop for Arena<'arena> {

oscars/src/alloc/arena2/mod.rs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ use rust_alloc::collections::LinkedList;
66
mod alloc;
77

88
use alloc::Arena;
9-
pub use alloc::{ArenaAllocationData, ArenaHeapItem, ArenaPointer, ErasedArenaPointer};
9+
pub use alloc::{ArenaAllocationData, ArenaHeapItem, ArenaPointer, ErasedArenaPointer, ErasedHeapItem};
1010

1111
#[cfg(test)]
1212
mod tests;
@@ -142,4 +142,13 @@ impl<'alloc> ArenaAllocator<'alloc> {
142142
drop(dead_arenas)
143143
}
144144
}
145+
146+
// checks dropped items across all arenas
147+
#[cfg(test)]
148+
pub fn arena_drop_states(&self) -> rust_alloc::vec::Vec<rust_alloc::vec::Vec<bool>> {
149+
self.arenas
150+
.iter()
151+
.map(|arena| arena.item_drop_states())
152+
.collect()
153+
}
145154
}

oscars/src/collectors/mark_sweep/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ This is a basic mark-sweep collector using an underlying arena allocator.
44

55
## TODO list
66

7-
- [ ] Support weak maps
8-
- [ ] Add Tests
7+
- [x] Support weak maps
8+
- [x] Add Tests
99

1010

1111
## Areas of improvement

oscars/src/collectors/mark_sweep/internals/ephemeron.rs

Lines changed: 50 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,33 +7,29 @@ use crate::{
77
collectors::mark_sweep::{
88
CollectionState, ErasedEphemeron, MarkSweepGarbageCollector, TraceColor,
99
internals::{GcBox, WeakGcBox, gc_header::HeaderColor},
10+
pointers::Gc,
1011
trace::Trace,
1112
},
1213
};
1314

1415
use crate::collectors::mark_sweep::Finalize;
1516

16-
// TODO: key's GcBox should be notably a weak box
1717
pub struct Ephemeron<K: Trace + ?Sized + 'static, V: Trace + 'static> {
1818
pub(crate) value: GcBox<V>,
1919
vtable: &'static EphemeronVTable,
2020
pub(crate) key: WeakGcBox<K>,
2121
}
2222

23-
// NOTE: There is going to be an issue here in that we initialize the GC
24-
// box to the wrong state.
25-
//
26-
// So we either need the color to be global that is provided to the allocation
27-
// or we need state access
2823
impl<K: Trace, V: Trace> Ephemeron<K, V> {
29-
pub fn new_in(key: K, value: V, collector: &mut MarkSweepGarbageCollector) -> Self
30-
where
31-
K: Sized,
24+
// Creates a new [`Ephemeron`] with given key and value
25+
//
26+
// The [`WeakGcBox`] for the key is created internally from the provided [`Gc`] pointer
27+
pub fn new_in(key: &Gc<K>, value: V, collector: &mut MarkSweepGarbageCollector) -> Self
3228
{
33-
let key = WeakGcBox::new_in(key, &collector.state);
34-
let value = GcBox::new_in(value, &collector.state);
29+
let weak_key = WeakGcBox::new(key.inner_ptr);
30+
let value = GcBox::new(value, &collector.state);
3531
let vtable = vtable_of::<K, V>();
36-
Self { key, value, vtable }
32+
Self { key: weak_key, value, vtable }
3733
}
3834

3935
pub fn key(&self) -> &K {
@@ -61,6 +57,14 @@ impl<K: Trace, V: Trace> Ephemeron<K, V> {
6157
pub(crate) fn drop_fn(&self) -> EphemeronDropFn {
6258
self.vtable.drop_fn
6359
}
60+
61+
pub(crate) fn is_reachable_fn(&self) -> EphemeronIsReachableFn {
62+
self.vtable.is_reachable_fn
63+
}
64+
65+
pub(crate) fn finalize_fn(&self) -> EphemeronFinalizeFn {
66+
self.vtable.finalize_fn
67+
}
6468
}
6569

6670
impl<K: Trace, V: Trace> Finalize for Ephemeron<K, V> {}
@@ -120,12 +124,42 @@ pub(crate) const fn vtable_of<K: Trace + 'static, V: Trace + 'static>() -> &'sta
120124
// SAFETY: The caller must ensure the erased pointer is not dropped or deallocated.
121125
unsafe { this.as_mut().mark_dropped() };
122126
}
127+
128+
// SAFETY: Cast back to concrete types to check reachability
129+
unsafe fn is_reachable_fn<K: Trace + 'static, V: Trace + 'static>(
130+
this: ErasedEphemeron,
131+
color: TraceColor,
132+
) -> bool {
133+
// SAFETY: The caller must ensure that the passed erased pointer is
134+
// `ArenaHeapItem<Ephemeron<K, V>>`
135+
let ephemeron = unsafe {
136+
this.cast::<ArenaHeapItem<Ephemeron<K, V>>>()
137+
.as_ref()
138+
.value()
139+
};
140+
ephemeron.is_reachable(color)
141+
}
142+
143+
// SAFETY: Cast back to concrete types to run finalizers
144+
unsafe fn finalize_fn<K: Trace + 'static, V: Trace + 'static>(this: ErasedEphemeron) {
145+
// SAFETY: The caller must ensure that the passed erased pointer is
146+
// `ArenaHeapItem<Ephemeron<K, V>>`
147+
let ephemeron = unsafe {
148+
this.cast::<ArenaHeapItem<Ephemeron<K, V>>>()
149+
.as_ref()
150+
.value()
151+
};
152+
Finalize::finalize(ephemeron.key());
153+
Finalize::finalize(ephemeron.value());
154+
}
123155
}
124156

125157
impl<K: Trace + 'static, V: Trace + 'static> HasVTable for EphemeronMarker<K, V> {
126158
const VTABLE: &'static EphemeronVTable = &EphemeronVTable {
127159
trace_fn: EphemeronMarker::<K, V>::trace_fn::<K, V>,
128160
drop_fn: EphemeronMarker::<K, V>::drop_fn::<K, V>,
161+
is_reachable_fn: EphemeronMarker::<K, V>::is_reachable_fn::<K, V>,
162+
finalize_fn: EphemeronMarker::<K, V>::finalize_fn::<K, V>,
129163
_key_type_id: TypeId::of::<K>(),
130164
_key_size: size_of::<WeakGcBox<K>>(),
131165
_value_type_id: TypeId::of::<V>(),
@@ -138,10 +172,14 @@ pub(crate) const fn vtable_of<K: Trace + 'static, V: Trace + 'static>() -> &'sta
138172

139173
type EphemeronTraceFn = unsafe fn(this: ErasedEphemeron, color: TraceColor);
140174
type EphemeronDropFn = unsafe fn(this: ErasedEphemeron);
175+
type EphemeronIsReachableFn = unsafe fn(this: ErasedEphemeron, color: TraceColor) -> bool;
176+
type EphemeronFinalizeFn = unsafe fn(this: ErasedEphemeron);
141177

142178
pub struct EphemeronVTable {
143179
trace_fn: EphemeronTraceFn,
144180
drop_fn: EphemeronDropFn,
181+
is_reachable_fn: EphemeronIsReachableFn,
182+
finalize_fn: EphemeronFinalizeFn,
145183
_key_type_id: TypeId,
146184
_key_size: usize,
147185
_value_type_id: TypeId,

0 commit comments

Comments
 (0)