Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions notes/arena2_vs_boa_gc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# arena2 vs boa_gc benchmark results

Note author: shruti2522
date: 2026-03-06

This benchmark measures how the `arena2` allocator which uses a simple bump allocator with `TaggedPtr` headers for liveness compares against the standard `boa_gc` implementation

Ran the `arena2_vs_boa_gc` bench suite. It compares oscars' `arena2` against `boa_gc` across node allocation, collection pauses, mixed workloads, and memory pressure.

## Results

### gc_node_allocation

arena2 heavily outperforms boa_gc across all sizes.
- **10 nodes:** arena2 takes ~320 ns vs ~750 ns for boa_gc
- **100 nodes:** arena2 takes ~3.2 µs vs ~6.4 µs for boa_gc
- **1000 nodes:** arena2 takes ~27.3 µs vs ~56.2 µs for boa_gc

This shows that bump allocation into an arena page is consistently more than 2x faster than whatever the standard boa_gc is doing.

### gc_collection_pause

Similar to allocations, the sweep phase in arena2 is extremely fast compared to boa_gc.
- **100 objects:** arena2 sweeps in ~3.5 µs vs ~7.3 µs for boa_gc
- **500 objects:** arena2 sweeps in ~15.2 µs vs ~32.5 µs for boa_gc
- **1000 objects:** arena2 sweeps in ~29.5 µs vs ~74.9 µs for boa_gc

The linear scan over the contiguous blocks in arena2 during garbage collection cuts the pause times by more than half.

### mixed_workload

This tests repeated allocations spread around `collect()` pauses.
Both allocators performed similarly here. arena2 took ~17.8 µs and boa_gc took ~17.8 µs. So arena2's big speed advantage seems to even out when allocations and collections are mixed together.

### memory_pressure

This tests creating and deleting many objects quickly (make 50, keep 5, collect, repeat 10 times).
both allocators are equally fast here. arena2 took ~46.0 µs and boa_gc took ~46.6 µs. The cost of throwing away whole memory pages versus single objects seems to balance out

## Conclusion

`arena2` is much faster for simple allocations and collection sweeps, about twice as fast. In mixed tests and heavy memory tests, they perform about the same.
4 changes: 4 additions & 0 deletions oscars/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ required-features = ["gc_allocator"]
name = "arena2_vs_mempool3"
harness = false

[[bench]]
name = "arena2_vs_boa_gc"
harness = false

[features]
default = ["mark_sweep"]
std = []
Expand Down
192 changes: 192 additions & 0 deletions oscars/benches/arena2_vs_boa_gc.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
use criterion::{BenchmarkId, Criterion, black_box, criterion_group, criterion_main};
use oscars::collectors::mark_sweep_arena2::{
Finalize, Gc as OscarsGc, MarkSweepGarbageCollector, Trace, TraceColor,
cell::GcRefCell as OscarsGcRefCell,
};

use boa_gc::{Gc as BoaGc, GcRefCell as BoaGcRefCell, force_collect as boa_force_collect};

fn bench_alloc(c: &mut Criterion) {
let mut group = c.benchmark_group("gc_node_allocation");

for size in [10, 100, 1000].iter() {
group.bench_with_input(BenchmarkId::new("arena2", size), size, |b, &size| {
let collector = MarkSweepGarbageCollector::default()
.with_arena_size(65536)
.with_heap_threshold(262144);

b.iter(|| {
let mut roots = Vec::new();
for i in 0..size {
let root = OscarsGc::new_in(OscarsGcRefCell::new(i), &collector);
roots.push(root);
}
black_box(roots.len())
});
});

group.bench_with_input(BenchmarkId::new("boa_gc", size), size, |b, &size| {
b.iter_batched(
|| {
boa_force_collect();
},
|()| {
let mut gcs = Vec::new();
for i in 0..size {
let gc = BoaGc::new(BoaGcRefCell::new(i));
gcs.push(gc);
}
black_box(gcs.len())
},
criterion::BatchSize::SmallInput,
);
});
}

group.finish();
}

fn bench_collection(c: &mut Criterion) {
let mut group = c.benchmark_group("gc_collection_pause");

for size in [100, 500, 1000].iter() {
group.bench_with_input(BenchmarkId::new("arena2", size), size, |b, &size| {
let collector = MarkSweepGarbageCollector::default()
.with_arena_size(65536)
.with_heap_threshold(262144);

b.iter(|| {
let mut roots = Vec::new();
for i in 0..size {
let root = OscarsGc::new_in(OscarsGcRefCell::new(i), &collector);
roots.push(root);
}
// let half be garbage
roots.truncate(size / 2);
collector.collect();
black_box(roots.len())
});
});

group.bench_with_input(BenchmarkId::new("boa_gc", size), size, |b, &size| {
b.iter(|| {
let mut gcs = Vec::new();
for i in 0..size {
let gc = BoaGc::new(BoaGcRefCell::new(i));
gcs.push(gc);
}
gcs.truncate(size / 2);
boa_force_collect();
black_box(gcs.len())
});
});
}

group.finish();
}

fn bench_mixed(c: &mut Criterion) {
let mut group = c.benchmark_group("mixed_workload");

group.bench_function("arena2", |b| {
let collector = MarkSweepGarbageCollector::default()
.with_arena_size(65536)
.with_heap_threshold(131072);

b.iter(|| {
let mut roots = Vec::new();

for i in 0..100 {
let root = OscarsGc::new_in(OscarsGcRefCell::new(i), &collector);
roots.push(root);
}
collector.collect();

for i in 100..200 {
let root = OscarsGc::new_in(OscarsGcRefCell::new(i), &collector);
roots.push(root);
}
collector.collect();

black_box(roots.len())
});
});

group.bench_function("boa_gc", |b| {
b.iter(|| {
let mut gcs = Vec::new();

for i in 0..100 {
let gc = BoaGc::new(BoaGcRefCell::new(i));
gcs.push(gc);
}
boa_force_collect();

for i in 100..200 {
let gc = BoaGc::new(BoaGcRefCell::new(i));
gcs.push(gc);
}
boa_force_collect();

black_box(gcs.len())
});
});

group.finish();
}

fn bench_pressure(c: &mut Criterion) {
let mut group = c.benchmark_group("memory_pressure");

group.bench_function("arena2", |b| {
let collector = MarkSweepGarbageCollector::default()
.with_arena_size(32768)
.with_heap_threshold(65536);

b.iter(|| {
let mut live = Vec::new();

for round in 0..10 {
for i in 0..50 {
let obj = OscarsGc::new_in(OscarsGcRefCell::new(round * 100 + i), &collector);
if i % 10 == 0 {
live.push(obj);
}
}
collector.collect();
}

black_box(live.len())
});
});

group.bench_function("boa_gc", |b| {
b.iter(|| {
let mut live = Vec::new();

for round in 0..10 {
for i in 0..50 {
let obj = BoaGc::new(BoaGcRefCell::new(round * 100 + i));
if i % 10 == 0 {
live.push(obj);
}
}
boa_force_collect();
}

black_box(live.len())
});
});

group.finish();
}

criterion_group!(
benches,
bench_alloc,
bench_collection,
bench_mixed,
bench_pressure,
);

criterion_main!(benches);
27 changes: 27 additions & 0 deletions oscars/src/alloc/arena2/alloc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,15 @@ impl<T: ?Sized> ArenaHeapItem<T> {
&mut self.value as *mut T
}

/// Returns a raw mutable pointer to the value
///
/// This avoids creating a `&mut self` reference, which can lead to stacked borrows
/// if shared references to the heap item exist
pub(crate) fn as_value_ptr(ptr: NonNull<Self>) -> *mut T {
// SAFETY: `&raw mut` computes the field address without creating a reference
unsafe { &raw mut (*ptr.as_ptr()).value }
}

fn value_mut(&mut self) -> &mut T {
&mut self.value
}
Expand Down Expand Up @@ -133,6 +142,15 @@ impl<'arena> ErasedArenaPointer<'arena> {
self.0.as_ptr()
}

/// Extend the lifetime of this erased arena pointer to 'static
///
/// SAFETY:
///
/// safe because the gc collector owns the arena and keeps it alive
pub(crate) unsafe fn extend_lifetime(self) -> ErasedArenaPointer<'static> {
ErasedArenaPointer(self.0, PhantomData)
}

/// Returns an [`ArenaPointer`] for the current [`ErasedArenaPointer`]
///
/// # Safety
Expand Down Expand Up @@ -178,6 +196,15 @@ impl<'arena, T> ArenaPointer<'arena, T> {
pub fn to_erased(self) -> ErasedArenaPointer<'arena> {
self.0
}

/// Extend the lifetime of this arena pointer to 'static
///
/// SAFETY:
///
/// safe because the gc collector owns the arena and keeps it alive
pub(crate) unsafe fn extend_lifetime(self) -> ArenaPointer<'static, T> {
ArenaPointer(self.0.extend_lifetime(), PhantomData)
}
}

const FULL_MASK: u8 = 0b0100_0000;
Expand Down
2 changes: 1 addition & 1 deletion oscars/src/alloc/arena2/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ fn arc_drop() {
let heap_item_mut = heap_item.as_mut();
// Manually drop the heap item
heap_item_mut.mark_dropped();
drop_in_place(heap_item_mut.as_ptr());
drop_in_place(ArenaHeapItem::as_value_ptr(heap_item));
};

assert!(dropped.load(Ordering::SeqCst));
Expand Down
8 changes: 8 additions & 0 deletions oscars/src/collectors/mark_sweep/cell.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,14 @@ impl<T: ?Sized> GcRefCell<T> {
}
}

// returns a raw pointer to the inner value or `None` if currently mutably borrowed
pub(crate) fn get_raw(&self) -> Option<*mut T> {
match self.borrow.get().borrowed() {
BorrowState::Writing => None,
_ => Some(self.cell.get()),
}
}

/// Mutably borrows the wrapped value, returning an error if the value is currently borrowed.
///
/// The borrow lasts until the returned `GcCellRefMut` exits scope.
Expand Down
6 changes: 3 additions & 3 deletions oscars/src/collectors/mark_sweep/internals/gc_box.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ impl<T: Trace + Finalize + ?Sized> WeakGcBox<T> {
pub(crate) fn erased_inner_ptr(&self) -> NonNull<GcBox<NonTraceable>> {
// SAFETY: `as_heap_ptr` returns a valid pointer to
// `PoolItem` whose lifetime is tied to the pool
let heap_item = unsafe { self.as_heap_ptr().as_mut() };
// SAFETY: We just removed this value from a NonNull
unsafe { NonNull::new_unchecked(heap_item.as_ptr()) }
let heap_item: *mut PoolItem<GcBox<NonTraceable>> = self.as_heap_ptr().as_ptr();
// SAFETY: `PoolItem` is repr(transparent), so pointing to and returning field 0 is valid.
unsafe { NonNull::new_unchecked(&raw mut (*heap_item).0) }
}

pub(crate) fn as_heap_ptr(&self) -> NonNull<PoolItem<GcBox<NonTraceable>>> {
Expand Down
1 change: 1 addition & 0 deletions oscars/src/collectors/mark_sweep/internals/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ mod gc_header;
mod vtable;

pub(crate) use ephemeron::Ephemeron;
pub(crate) use gc_header::{GcHeader, HeaderColor};
pub(crate) use vtable::{DropFn, TraceFn, VTable, vtable_of};

pub use self::gc_box::{GcBox, NonTraceable, WeakGcBox};
4 changes: 4 additions & 0 deletions oscars/src/collectors/mark_sweep/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,10 @@ impl MarkSweepGarbageCollector {

// Force drops all elements in the internal tracking queues and clears
// them without regard for reachability.
//
// NOTE: This intentionally differs from arena2's sweep_all_queues.
// arena3 uses`free_slot` calls to reclaim memory.
// arena2 uses a bitmap (`mark_dropped`) and reclaims automatically
fn sweep_all_queues(&self) {
let ephemerons = core::mem::take(&mut *self.ephemeron_queue.borrow_mut());
for ephemeron in ephemerons {
Expand Down
8 changes: 6 additions & 2 deletions oscars/src/collectors/mark_sweep/pointers/gc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,12 @@ impl<T: Trace> Gc<T> {

impl<T: Trace + ?Sized> Gc<T> {
pub(crate) fn as_sized_inner_ptr(&self) -> NonNull<GcBox<NonTraceable>> {
let heap_item = unsafe { self.as_heap_ptr().as_mut() };
unsafe { NonNull::new_unchecked(heap_item.as_ptr()) }
// SAFETY: use `&raw mut` to get a raw pointer without creating
// a `&mut` reference, avoiding Stacked Borrows UB during GC tracing
let raw: *mut PoolItem<GcBox<NonTraceable>> = self.as_heap_ptr().as_ptr();
// SAFETY: `raw` is non-null because it comes from `as_heap_ptr()`
// `PoolItem` is `#[repr(transparent)]` so it shares the same address as field 0
unsafe { NonNull::new_unchecked(&raw mut (*raw).0) }
}

pub(crate) fn as_heap_ptr(&self) -> NonNull<PoolItem<GcBox<NonTraceable>>> {
Expand Down
8 changes: 8 additions & 0 deletions oscars/src/collectors/mark_sweep/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,10 @@ mod gc_edge_cases {
next: Option<Gc<Node>>,
}

#[cfg(miri)]
const DEPTH: usize = 20;

#[cfg(not(miri))]
const DEPTH: usize = 1_000;

let mut head = Gc::new_in(Node { _id: 0, next: None }, collector);
Expand Down Expand Up @@ -620,6 +624,10 @@ mod gc_edge_cases {
next: Option<Gc<Chain>>,
}

#[cfg(miri)]
const LEN: usize = 20;

#[cfg(not(miri))]
const LEN: usize = 500;

let mut head = Gc::new_in(Chain { next: None }, collector);
Expand Down
3 changes: 3 additions & 0 deletions oscars/src/collectors/mark_sweep_arena2/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Mark sweep collector

This is a basic mark-sweep collector using an underlying arena2 allocator.
5 changes: 5 additions & 0 deletions oscars/src/collectors/mark_sweep_arena2/cell.rs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be copied again. Is there a reason the other cell intrinsics can't be shared between the two approaches here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed duplication in cell.rs, trace.rs and gc_header.rs now just re-export the mark_sweep ones directly since they share the exact same types. Also added a comment about the trace macros so it's clear in the future

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
//! A garbage collected cell implementation

pub use crate::collectors::mark_sweep::cell::{
BorrowError, BorrowMutError, GcRef, GcRefCell, GcRefMut,
};
Loading