Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion notes/2026-02-26.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# WeakMap Support

Revision: [2026-03-15](2026-03-15.md)

Note author: shruti2522

This document summarizes all the changes made to add weak map support to the mark sweep collector.
Expand Down Expand Up @@ -88,4 +90,4 @@ or write `unsafe` code, and it plugs right into the collector's existing trace
and sweep phases.

In the future, the best improvement would be switching from `HashMap` to `HashTable`
to save memory. Until then, this first version works well and gives us the weak map behavior we need
to save memory. Until then, this first version works well and gives us the weak map behavior we need.
68 changes: 68 additions & 0 deletions notes/2026-03-15.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# WeakMap Support Follow up

Note author: shruti2522

Revision of: [2026-02-26](2026-02-26.md)

This document is a follow up to the original weak map support note.

## what changed since the original note

### `WeakMap<K, V>` representation: `HashTable` over `HashMap`

The original note identified using `HashTable` instead of `HashMap` as a
potential improvement. We implemented this in the move away from
`HashMap<usize, ArenaPointer<Ephemeron<K, V>>>` to a
`HashTable<(usize, EphemeronPtr)>` model. The address and ephemeron pointer are
now stored inline as a tuple in the table entry, which saves the cost of a
separate key allocation and improves memory locality

### ephemeron vs key/value pair: why we kept the key/value approach

One research question was whether `insert` should take an `Ephemeron` directly
instead of a `(key, value)` pair. The answer is, it should not. Here's why:

**`Ephemeron` is a GC internal.** The collector owns the lifecycle
of ephemerons. They get allocated, traced during mark phase, checked during sweep
for reachability and finalized when their keys die.

If `WeakMap::insert` exposed `Ephemeron` in its public API, we would leak GC
internals into the user facing weak map interface. Instead, `insert` takes a
`(key, value)` pair, which is the right boundary. The ephemeron allocation and
queue registration happen internally via `collector.alloc_ephemeron_node`

This keeps the weak map simple for users while hiding the complex GC details.

### `replace_or_insert`: one lookup instead of two

Previously, `insert` did a two step update: remove any old ephemeron, then
insert the new one. This meant two lookups in the map and two queue operations.

Now, `replace_or_insert` does both operations in a single `HashTable::find_entry`
lookup:

- If an entry exists for the address, swap the new ephemeron in and invalidate the old one
- If no entry exists, insert the new entry

This is faster and also makes sure the old ephemeron is cleaned up before the
new one takes its place.

### how the collector manages weak maps

The collector owns all weak maps internally. `WeakMap` is just a handle pointing
to memory the collector owns. During cleanup, the collector prunes dead entries
from weak maps after marking dead objects but before freeing their memory. This
order matters because we need to read status bits on the ephemerons to decide
which entries to keep. If we freed the memory first, those bits would be gone.

## conclusion

`WeakMap::insert` should look simple to users
(just key and value), while all the ephemeron management stays hidden inside
the collector.

Changes made since the original note:

1. Switch to `HashTable` to store key and pointer together, saving memory
2. Use `replace_or_insert` for faster updates (one lookup instead of two)
3. Confirm that ephemerons should never appear in the user facing `WeakMap` API
49 changes: 23 additions & 26 deletions oscars/src/collectors/mark_sweep/pointers/weak_map.rs
Original file line number Diff line number Diff line change
Expand Up @@ -40,26 +40,24 @@ impl<K: Trace, V: Trace> WeakMapInner<K, V> {
}
}

fn remove_and_invalidate(&mut self, key_addr: usize) {
if let Ok(entry) = self
.entries
.find_entry(hash_addr(key_addr), |e| e.0 == key_addr)
{
let ((_, old_ephemeron), _) = entry.remove();
old_ephemeron.as_inner_ref().invalidate();
}
}

fn insert_ptr(
// replace an existing entry in one lookup, invalidating the old ephemeron
fn replace_or_insert(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: let's make this insert, and return an Option.

This API is pretty standard for various other Rust map types, and we should adhere to it where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

&mut self,
key_addr: usize,
ephemeron_ptr: PoolPointer<'static, Ephemeron<K, V>>,
new_ptr: PoolPointer<'static, Ephemeron<K, V>>,
) {
// caller guarantees no duplicate exists since remove_and_invalidate was called first
self.entries
.insert_unique(hash_addr(key_addr), (key_addr, ephemeron_ptr), |e| {
hash_addr(e.0)
});
let hash = hash_addr(key_addr);
match self.entries.find_entry(hash, |e| e.0 == key_addr) {
Ok(mut entry) => {
// swap without probing again
let old = core::mem::replace(entry.get_mut(), (key_addr, new_ptr));
old.1.as_inner_ref().invalidate();
}
Err(_absent) => {
self.entries
.insert_unique(hash, (key_addr, new_ptr), |e| hash_addr(e.0));
}
}
}

fn get(&self, key: &Gc<K>) -> Option<&V> {
Expand Down Expand Up @@ -132,24 +130,23 @@ impl<K: Trace, V: Trace> WeakMap<K, V> {
Self { inner }
}

// insert a value for `key`, replacing and invalidating any old ephemeron
pub fn insert<C: Collector>(&mut self, key: &Gc<K>, value: V, collector: &C) {
let key_addr = key.inner_ptr.as_non_null().as_ptr() as usize;

// remove and invalidate any existing ephemeron for this key
// SAFETY: we have unique access to `self`
unsafe { self.inner.as_mut().remove_and_invalidate(key_addr) };

//allocate the new ephemeron node
let ephemeron_ptr = collector
.alloc_ephemeron_node(key, value)
.expect("Failed to allocate ephemeron");

// SAFETY: safe because the gc tracks this
// SAFETY: the collector keeps the pool alive for the map lifetime
let ephemeron_ptr = unsafe { ephemeron_ptr.extend_lifetime() };

//insert the new node using another short lived mutable borrow
// SAFETY: we have unique access to `self`
unsafe { self.inner.as_mut().insert_ptr(key_addr, ephemeron_ptr) };
// SAFETY: `&mut self` gives exclusive access to `inner`
unsafe {
self.inner
.as_mut()
.replace_or_insert(key_addr, ephemeron_ptr)
};
}

pub fn get(&self, key: &Gc<K>) -> Option<&V> {
Expand Down
49 changes: 23 additions & 26 deletions oscars/src/collectors/mark_sweep_arena2/pointers/weak_map.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,26 +39,24 @@ impl<K: Trace, V: Trace> WeakMapInner<K, V> {
}
}

fn remove_and_invalidate(&mut self, key_addr: usize) {
if let Ok(entry) = self
.entries
.find_entry(hash_addr(key_addr), |e| e.0 == key_addr)
{
let ((_, old_ephemeron), _) = entry.remove();
old_ephemeron.as_inner_ref().invalidate();
}
}

fn insert_ptr(
// replace an existing entry in one lookup, invalidating the old ephemeron
fn replace_or_insert(
&mut self,
key_addr: usize,
ephemeron_ptr: ArenaPointer<'static, Ephemeron<K, V>>,
new_ptr: ArenaPointer<'static, Ephemeron<K, V>>,
) {
// caller guarantees no duplicate exists since remove_and_invalidate was called first
self.entries
.insert_unique(hash_addr(key_addr), (key_addr, ephemeron_ptr), |e| {
hash_addr(e.0)
});
let hash = hash_addr(key_addr);
match self.entries.find_entry(hash, |e| e.0 == key_addr) {
Ok(mut entry) => {
// swap without probing again
let old = core::mem::replace(entry.get_mut(), (key_addr, new_ptr));
old.1.as_inner_ref().invalidate();
}
Err(_absent) => {
self.entries
.insert_unique(hash, (key_addr, new_ptr), |e| hash_addr(e.0));
}
}
}

fn get(&self, key: &Gc<K>) -> Option<&V> {
Expand Down Expand Up @@ -133,6 +131,7 @@ impl<K: Trace, V: Trace> WeakMap<K, V> {
Self { inner }
}

// insert a value for `key`, replacing and invalidating any old ephemeron
pub fn insert(
&mut self,
key: &Gc<K>,
Expand All @@ -141,22 +140,20 @@ impl<K: Trace, V: Trace> WeakMap<K, V> {
) {
let key_addr = key.inner_ptr.as_non_null().as_ptr() as usize;

// remove and invalidate any existing ephemeron for this key
// SAFETY: we have unique access to `self`
unsafe { self.inner.as_mut().remove_and_invalidate(key_addr) };

//allocate the new ephemeron node
let ephemeron_ptr = collector
.alloc_ephemeron_node(key, value)
.expect("Failed to allocate ephemeron");

// SAFETY: safe because the gc tracks this
// SAFETY: the collector keeps the pool alive for the map lifetime
let ephemeron_ptr: ArenaPointer<'static, Ephemeron<K, V>> =
unsafe { ephemeron_ptr.extend_lifetime() };

//insert the new node using another short lived mutable borrow
// SAFETY: we have unique access to `self`
unsafe { self.inner.as_mut().insert_ptr(key_addr, ephemeron_ptr) };
// SAFETY: `&mut self` gives exclusive access to `inner`
unsafe {
self.inner
.as_mut()
.replace_or_insert(key_addr, ephemeron_ptr)
};
}

pub fn get(&self, key: &Gc<K>) -> Option<&V> {
Expand Down