diff --git a/cranelift/docs/stack-maps.md b/cranelift/docs/stack-maps.md new file mode 100644 index 000000000000..b3c96e1f8343 --- /dev/null +++ b/cranelift/docs/stack-maps.md @@ -0,0 +1,240 @@ +# Stack maps + +While Cranelift is primarily meant to compile WebAssembly, many aspects of the +implementation process can overlap with many other use-cases, such as compiling +custom programming languages via JIT or AOT. And in the same vein, many programming +language compilers have an interest in implementing some garbage collection algorithm, +simplifying the process of managing memory considerably. While Cranelift can't +provide any pre-built solution, it does include the facilities to implement a +competent tracing garbage collector using *stack maps*. + +This document assumes you already know a little bit about garbage collection and +safepoints. If not, you can read [New Stack Maps for Wasmtime and Cranelift]. + +[New Stack Maps for Wasmtime and Cranelift]: https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime#background-garbage-collection-safepoints-and-stack-maps + +There is an example project which can be used as a reference when creating +a similar implementation, which can be found in the [`collector` example]. + +[`collector` example]: https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/jit/examples/collector + +## Declaring objects + +Before stack maps can be used, they need to be populated. Oftentimes, stack maps +are populated by declaring when a value or register should be inserted into the +stack map. The user must manually declare all the references which need to be +declared in the stack map, otherwise the stack map will be empty. + +Declaring a value and register can be done with the [`declare_value_needs_stack_map`] +and [`declare_var_needs_stack_map`] methods, respectively. + +[`declare_value_needs_stack_map`]: https://docs.rs/cranelift-frontend/latest/cranelift_frontend/struct.FunctionBuilder.html#method.declare_value_needs_stack_map +[`declare_var_needs_stack_map`]: https://docs.rs/cranelift-frontend/latest/cranelift_frontend/struct.FunctionBuilder.html#method.declare_var_needs_stack_map + +This is a simple, although redundant, function to create a square with some given dimensions: + +```rs +fn create_square(width: u32, height: u32) -> Square { + let square = Square { width, height }; + + square.ensure_valid(); + + square +} +``` + +Here is the same function compiled into Cranelift IR (with 64-bit pointers): +``` +function %create_square(i32, i32) -> i64 { +block1(v0: i32, v1: i32): + v2 = iconst i64 8 + v3 = call allocate(v2) ; v2 = 8 + + store v0, v3+0 + store v1, v3+4 + + call Square::ensure_valid(v3) + + return v3 +``` + +In this example, we'd want to keep `square` alive, at least until the call to `Square::ensure_valid` +returns. What you'd likely do is declare the value `square` as needing a stack map right +after it is allocated, so that is included in the stack map on the call to `Square::ensure_valid`. +This will cause `square` to be moved from volatile registers into well-known locations +in the stack map. + +After declaring `square` as needing a stack map entry, the same CLIR will now look like this: +``` +function %create_square(i32, i32) -> i64 { + ss0 = explicit_slot 8, align = 8 + +block1(v0: i32, v1: i32): + v2 = iconst i64 8 + v3 = call allocate(v2) ; v2 = 8 + + store v0, v3+0 + store v1, v3+4 + + ;; Store the allocated object to the stack right + ;; before the call to `ensure_valid`. + v5 = stack_addr.i64 ss0 + store notrap v3, v5 + + ;; Annotate the call with the created stack map + call Square::ensure_valid(v3), stack_map=[i64 @ ss0+0] + + ;; Load the value from the stack again + v6 = load.i64 notrap v5 + + ;; Return the loaded stack value instead of + ;; the reference in `v3`. + return v6 +``` + +## Using stack maps + +Stack maps can be inspected and used right after a function has been compiled +and finalized by Cranelift. This is backend-agnostic, so there is no difference in +whether the function is compiled via JIT, object emission or any other backend. + +To get the stack maps which were generated for a given function, you can introspect +the compiled code from the `Context`, in which the function is compiled: + +```rs +module.define_function(func_id, &mut ctx)?; + +let compiled_code = ctx.compiled_code().expect("expected context to be compiled"); + +for (offset, length, map) in compiled_code.buffer.user_stack_maps() { + let items = map.entries().map(|(_, offset)| offset as usize).collect::>(); + + println!("Stack map:"); + println!(" Offset: {}", *offset); + println!(" Size: {}", *length); + println!(" Entries: {items:?}"); +} +``` + +From the example above, you can expect something similar to the following as output: +``` +Stack map: + Offset: 96 + Size: 64 + Entries: [0] +``` + +Stack maps are emitted once per safepoint, which happens on each `call` instruction. Since +the example function only performs a single call with a stack map annotation, there is only +a single safepoint in the function. + +Each stack map will contain: +- **Offset**: the offset of the program counter from which the stack map is applicable. The + offset is relative to the address of the first instruction in the owning function. +- **Size**: the length of the interval in which the stack map is applicable, in bytes. +- **Entries**: will be covered a little later. + +As you may notice, the `offset` and `size` fields can be used to create an address interval +in which the stack map is valid. Say a function is compiled and exists at address `0xBFC00000` +and have 2 stack maps: +- **Stack map 1**: + - Offset: 24 + - Size: 64 +- **Stack map 2**: + - Offset: 96 + - Size: 32 + +Stack map 1 will be valid in the interval of `0xBFC00018-0xBFC00058` and stack map 2 will be +valid in the interval of `0xBFC00060-0xBFC00080`. Whenever a call is made inside a safepoint, +the return address will exist within one of these intervals, which then indicates which objects +are alive at that given point. + +### Entries in the stack map + +Remember earlier when the generated CLIR would store valid objects in the current stack frame? +The entries within the stack map are address offsets, which point to these live objects. The +offsets are relative to the stack pointer, at the time the objects were spilled to the stack. +When added together, you get an address to inside the stack frame which holds a pointer to the +object. + +Because the address you get back is an address to the object *inside the stack frame*, this allows +for generational or compacting collectors, which can relocate the object entirely during collection. +You would only need to overwrite the pointer in the stack frame to the new object location, after +which it will be reloaded again after the call. + +## Invoking a collection + +While that's well and all, how would we actually trigger an allocation in the garbage collector, +let alone get the appropriate program counter and stack pointer? + +Well, since safepoints are emitted on every call instruction, we can place an implicit call to +trigger the collection just before other function calls, effectively "stealing" the stack map +at that particular point. In practice, the CLIR might look similar to this: +``` +function %create_square(i32, i32) -> i64 { + ss0 = explicit_slot 8, align = 8 + +block1(v0: i32, v1: i32): + v2 = iconst i64 8 + v3 = call allocate(v2) ; v2 = 8 + + store v0, v3+0 + store v1, v3+4 + + ;; Store the allocated object to the stack right + ;; before the next call instruction. + v5 = stack_addr.i64 ss0 + store notrap v3, v5 + + ;; Trigger the collector inside the safepoint, so all + ;; objects exist on the stack. + call GC::trigger(), stack_map=[i64 @ ss0+0] + + ;; Load the value from the stack again, since the object + ;; may have been relocated. + v6 = load.i64 notrap v5 + + ;; Pass the loaded stack value instead of the reference in `v3`. + ;; Notice how this call now no longer has any stack map annotation. + call Square::ensure_valid(v6) + + return v6 +``` + +Inside the `GC::trigger` function is where you'd handle the actual garbage collection +itself. This function can be an external symbol, Rust function, etc. and does not need +to be compiled using Cranelift. + +Inside this function, you'd need to find the stack pointer and program counter from just +before the call, so you know which stack map to use. To do this, you might want to employ +something called **stack walking** or **frame walking**. While outside of the scope of this +article, you can see how Wasmtime implements it for different architectures in [the unwinder crate] or +see the [collector example project]. + +[the unwinder crate]: https://github.com/bytecodealliance/wasmtime/blob/main/crates/unwinder/src/stackwalk.rs +[collector example project]: https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/jit/examples/collector + +After finding all live objects for a given point, it's only a matter of filtering them out from +all allocations made, to find the set of all dead objects which can be deallocated. + +## Which values should be added to the stack map? + +Depending on the way objects and allocations are used in your implementation, there might +be some confusion as to *what* should be included in stack maps. Below are some general +guidelines which will work for most scenarios, but maybe not all. + +In general, you should declare variables and/or values if they: +- are managed objects themselves or point to an object inside the managed heap, +- refer to some offset of a managed object (ie. object field references), +- or are somehow derived from a managed object (e.g. an element of an array) + +On the other hand, you should not declare variables and/or values if they: +- represent an immediate value, such as integers, floats, booleans, etc., +- have been allocated outside the scope of the garbage collector (e.g. static data), +- or points to an address which isn't a managed object + +It should also be noted that whenever a new block parameter is created which accepts a +reference to a managed object, that parameter may also need to be declared as needing +a stack map. Following the example from earlier, an implementation of `ensure_valid` +would need to declare it's parameter as needing a stack map, since the passed `square` +value is a managed object. diff --git a/cranelift/jit/examples/collector/README.md b/cranelift/jit/examples/collector/README.md new file mode 100644 index 000000000000..271a7117daef --- /dev/null +++ b/cranelift/jit/examples/collector/README.md @@ -0,0 +1,14 @@ +# Example of garbage collector + +This example shows off how to implement a tracing garbage collector using stack +maps in [Cranelift](https://crates.io/crates/cranelift). The garbage collector +is a very simple implementation using Rust's built-in +[`std::alloc`](https://doc.rust-lang.org/std/alloc/index.html) allocator, uses global +state and does not support multi-threaded usage. + +For a more detailed explanation of stack maps, see [Stack maps] and [New Stack Maps for Wasmtime and Cranelift]. + +[Stack maps]: /cranelift/docs/stack-maps.md +[New Stack Maps for Wasmtime and Cranelift]: https://bytecodealliance.org/articles/new-stack-maps-for-wasmtime#background-garbage-collection-safepoints-and-stack-maps + +This sample current supports `x86`, `x86_64` and `aarch64`. diff --git a/cranelift/jit/examples/collector/arch.rs b/cranelift/jit/examples/collector/arch.rs new file mode 100644 index 000000000000..3ccee7942734 --- /dev/null +++ b/cranelift/jit/examples/collector/arch.rs @@ -0,0 +1,47 @@ +//! Architecture-specific handling of frame pointers, stack registers, etc. +//! +//! Most of this file has been copied from the [`unwinder`] crate in Wasmtime. + +#[cfg(target_arch = "x86_64")] +mod x86_64 { + /// Stack pointer of the caller, relative to the current frame pointer. + pub const PARENT_SP_FROM_FP_OFFSET: usize = 16; + + /// Gets the frame pointer which is the parent of the given + /// frame, pointed to by `fp`. + #[inline] + pub(crate) unsafe fn parent_frame_pointer(fp: *const u8) -> *const u8 { + (unsafe { *(fp as *mut usize) }) as *const u8 + } + + /// Gets the return address of the frame, pointed to by `fp`. + #[inline] + pub(crate) unsafe fn return_addr_of_frame(fp: *const u8) -> *const u8 { + (unsafe { *(fp as *mut usize).offset(1) }) as *const u8 + } +} + +#[cfg(target_arch = "aarch64")] +mod aarch64 { + /// Stack pointer of the caller, relative to the current frame pointer. + pub const PARENT_SP_FROM_FP_OFFSET: usize = 16; + + /// Gets the frame pointer which is the parent of the given + /// frame, pointed to by `fp`. + #[inline] + pub(crate) unsafe fn parent_frame_pointer(fp: *const u8) -> *const u8 { + (unsafe { *(fp as *mut usize) }) as *const u8 + } + + /// Gets the return address of the frame, pointed to by `fp`. + #[inline] + pub(crate) unsafe fn return_addr_of_frame(fp: *const u8) -> *const u8 { + (unsafe { *(fp as *mut usize).offset(1) }) as *const u8 + } +} + +#[cfg(target_arch = "x86_64")] +pub(crate) use x86_64::*; + +#[cfg(target_arch = "aarch64")] +pub(crate) use aarch64::*; diff --git a/cranelift/jit/examples/collector/frame.rs b/cranelift/jit/examples/collector/frame.rs new file mode 100644 index 000000000000..c20ecc85e2b2 --- /dev/null +++ b/cranelift/jit/examples/collector/frame.rs @@ -0,0 +1,315 @@ +//! This file is focused on iterating through the frame stack, +//! and finding all the live object references. + +use std::cmp::Ordering; +use std::collections::LinkedList; +use std::fmt::Display; +use std::ops::ControlFlow; +use std::sync::{LazyLock, OnceLock, RwLock}; + +/// Stack-map for a given function. +/// +/// The vector defines a list of tuples containing the offset +/// of the stack map relative to the start of the function, as well +/// as all spilled GC references at that specific address. +/// +/// The spilled GC references are defined as a list of offsets, +/// relative to the stack pointer which contain a reference to a living +/// GC reference. +pub type FunctionStackMap = Vec<(usize, usize, Vec)>; + +/// Metadata entry for a single compiled function. +#[derive(Debug)] +pub struct CompiledFunctionMetadata { + /// Defines the address of the first instruction in the function. + pub start: *const u8, + + /// Defines the address of the last instruction in the function. + pub end: *const u8, + + /// Defines a list of all stack maps found within the function, + /// keyed by offset from [`CompiledFunctionMetadata::start`]. + pub stack_locations: FunctionStackMap, +} + +impl CompiledFunctionMetadata { + /// Gets the [`Ordering`] of the given address, in reference to the interval of + /// the current metadata entry. This method is used for iterating over a list of + /// metadata entries using a binary search. + /// + /// The truth table for the method is as such[^note]: + /// + /// | Input | Output ([`Ordering`]) | + /// |----------------------------|-----------------------| + /// | `start` > `addr` | [`Ordering::Greater`] | + /// | `end` < `addr` | [`Ordering::Less`] | + /// | `start` <= `addr` <= `end` | [`Ordering::Equal`] | + /// + /// [^note]: `start` and `end` denotes the `start` and `end` field in + /// [`CompiledFunctionMetadata`], respectively. + #[inline] + pub fn ordering_of(&self, addr: *const u8) -> Ordering { + if self.start > addr { + Ordering::Greater + } else if addr > self.end { + Ordering::Less + } else { + Ordering::Equal + } + } +} + +unsafe impl Send for CompiledFunctionMetadata {} +unsafe impl Sync for CompiledFunctionMetadata {} + +static FUNC_STACK_MAPS: OnceLock> = OnceLock::new(); + +/// Declares the stack maps for all generated functions in the runtime. +/// +/// # Panics +/// +/// This function **will** panic if the stack maps are declared more than once. +pub fn declare_stack_maps(mut stack_maps: Vec) { + stack_maps.sort_by_key(|func| func.start.addr()); + + FUNC_STACK_MAPS + .set(stack_maps) + .expect("function stack maps should only be assigned once"); +} + +/// Represents a single stack map, corresponding to a specific +/// safepoint location within a compiled Lume function. +#[derive(Debug)] +pub(crate) struct FrameStackMap { + pub map: &'static CompiledFunctionMetadata, + pub frame_pointer: *const u8, + pub program_counter: *const u8, +} + +impl FrameStackMap { + /// Gets the offset of the stack frame from the first + /// instruction in the associated function. + #[inline] + pub(crate) fn offset(&self) -> usize { + self.program_counter.addr() - self.map.start.addr() + } + + /// Gets the stack pointer which is associated with the frame. + #[inline] + pub(crate) fn stack_pointer(&self) -> *const u8 { + unsafe { + crate::arch::parent_frame_pointer(self.frame_pointer) + .byte_add(crate::arch::PARENT_SP_FROM_FP_OFFSET) + } + } + + /// Gets all the stack location offsets of the current frame stack map. + /// + /// The returned slice will be a list of offsets relative to the stack pointer + /// of the frame, which will contain a pointer to a GC reference. + /// + /// For more information, see [`stack_locations`] which will get the absolute + /// addresses of the GC references. + #[inline] + pub(crate) fn stack_offsets(&self) -> &[usize] { + let offset = self.offset(); + + self.map + .stack_locations + .iter() + .find_map(|loc| { + if offset >= loc.0 && loc.0 + loc.1 >= offset { + Some(loc.2.as_slice()) + } else { + None + } + }) + .unwrap_or_else(|| &[]) + } + + /// Attempts to find all GC references found inside of the stack map for the current + /// program counter. The returned iterator will iterate over a list of pointers, + /// which point to an item inside the current stack frame. + /// + /// To get the address of the underlying allocation, simply read the pointer. This + /// is to facilitate the GC moving the underlying allocation to a different address, + /// whereafter it can write the new address to the pointer in the stack frame. + #[inline] + pub(crate) fn stack_locations(&self) -> impl Iterator { + self.stack_offsets() + .iter() + .map(|offset| unsafe { self.stack_pointer().byte_add(*offset) } as *const *const u8) + } + + /// Attempts to find all GC references found inside of the stack map for the current + /// program counter. + /// + /// The returned iterator will iterate over a list of tuples. The first element in the + /// tuple is an entry in the current stack frame containing the GC reference and the + /// second element is a pointer to the GC reference itself. + #[inline] + pub(crate) fn stack_value_locations( + &self, + ) -> impl Iterator { + self.stack_locations().map(|ptr| { + let gc_ref = unsafe { ptr.read() }; + + (ptr, gc_ref) + }) + } +} + +impl Display for FrameStackMap { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.write_fmt(format_args!( + "Frame: PC={:p}, FP={:p}, SP={:p}", + self.program_counter, + self.frame_pointer, + self.stack_pointer() + )) + } +} + +/// Represents an entry in the managed call-stack. +#[derive(Clone, Copy)] +pub(crate) struct FrameEntry { + pub frame_pointer: *const u8, + pub program_counter: *const u8, +} + +impl FrameEntry { + /// Attempts to find the stack map for the function, which matches the + /// current frame entry. If no function is found for the given entry or if + /// no stack map is attached to the found function, returns [`None`]. + /// + /// # Panics + /// + /// This function will panic if the stack maps have not yet been declared. To declare + /// them, use [`declare_stack_maps`]. + fn find_stack_map(self) -> Option { + let stack_maps = FUNC_STACK_MAPS + .get() + .expect("expected function stack map to be set"); + + if let Ok(idx) = + stack_maps.binary_search_by(|probe| probe.ordering_of(self.program_counter)) + { + let stack_map = stack_maps + .get(idx) + .expect("expected index to exist after search"); + + return Some(FrameStackMap { + map: stack_map, + frame_pointer: self.frame_pointer, + program_counter: self.program_counter, + }); + } + + None + } +} + +impl Display for FrameEntry { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + f.write_fmt(format_args!( + "Frame entry: PC={:p}, FP={:p}", + self.program_counter, self.frame_pointer + )) + } +} + +unsafe impl Send for FrameEntry {} +unsafe impl Sync for FrameEntry {} + +/// The globally-available managed call-stack. +/// +/// The managed call-stack is a linked-list of all entry +/// frames (calls from host-to-JIT) and exit frames (calls from JIT-to-host). +/// +/// This stack allows for performant stack walking without having to inspect +/// hardware registers or rely on compliant usage of frame pointers. +static FRAME_STACK: LazyLock>> = + LazyLock::new(|| RwLock::new(LinkedList::new())); + +/// Push a new frame entry onto the managed call stack. +/// +/// Frame entries get pushed whenever a call from host-to-JIT or JIT-to-host +/// is made, so we can walk the frame stack. +pub(crate) fn push_frame_entry(frame_pointer: *const u8, program_counter: *const u8) { + FRAME_STACK.try_write().unwrap().push_front(FrameEntry { + frame_pointer, + program_counter, + }); +} + +/// Pop the top-level frame entry off the managed call stack. +pub(crate) fn pop_frame_entry() { + FRAME_STACK + .try_write() + .unwrap() + .pop_front() + .expect("attempted to exit frame without corresponding entry"); +} + +/// Walk the current frame stack, calling `f` with a matching +/// pair of entry- and exit-frames, as we walk. +pub(crate) fn visit_chunked_frames( + mut f: impl FnMut(FrameEntry, FrameEntry) -> ControlFlow, +) -> Option { + let frames = FRAME_STACK.try_read().unwrap(); + let mut frame_iter = frames.iter(); + + loop { + let exit = frame_iter.next()?; + let entry = frame_iter.next()?; + + if let ControlFlow::Break(val) = f(*entry, *exit) { + return Some(val); + } + } +} + +/// Walk the current frame stack, calling `f` for each frame we walk. +pub(crate) fn visit_frames(mut f: impl FnMut(FrameEntry) -> ControlFlow) -> Option { + visit_chunked_frames(|entry, exit| { + let mut fp = exit.frame_pointer; + + while fp != entry.frame_pointer { + // The exit frame pointer should always be a sub-frame of + // the entry frame. + debug_assert!(fp <= entry.frame_pointer); + + let pc = unsafe { crate::arch::return_addr_of_frame(fp) }; + + let entry = FrameEntry { + frame_pointer: fp, + program_counter: pc, + }; + + if let ControlFlow::Break(value) = f(entry) { + return ControlFlow::Break(value); + } + + fp = unsafe { crate::arch::parent_frame_pointer(fp) }; + } + + ControlFlow::Continue(()) + }) +} + +/// Attempts to find a frame stack map which corresponds to the current frame pointer. +/// +/// If no frame stack map can be found for the current frame pointer, the function +/// iterates through all parent frames, until a frame stack map is found. +/// +/// If no frame stack maps are found in any parent frames, the functions returns [`None`]. +#[inline] +pub(crate) fn find_current_stack_map() -> Option { + visit_frames(|frame| { + if let Some(stack_map) = frame.find_stack_map() { + return ControlFlow::Break(stack_map); + } + + ControlFlow::Continue(()) + }) +} diff --git a/cranelift/jit/examples/collector/gc.rs b/cranelift/jit/examples/collector/gc.rs new file mode 100644 index 000000000000..da84cfae1354 --- /dev/null +++ b/cranelift/jit/examples/collector/gc.rs @@ -0,0 +1,93 @@ +//! Garbage collector implementation. +//! +//! This implementation is not fast and it does not scale. It is meant to +//! show a functional, yet simple, example implementation which can be used +//! as a first version. + +use std::alloc::{Layout, alloc, dealloc}; +use std::collections::HashMap; +use std::sync::{LazyLock, RwLock}; + +/// Immutable, thread-transportable pointer type. +#[derive(Hash, Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] +pub struct FunctionPtr(*const u8); + +impl FunctionPtr { + #[inline] + pub fn new(ptr: *const u8) -> Self { + FunctionPtr(ptr) + } + + #[inline] + pub fn ptr(self) -> *const u8 { + self.0 + } +} + +unsafe impl Send for FunctionPtr {} +unsafe impl Sync for FunctionPtr {} + +const POINTER_ALIGNMENT: usize = std::mem::align_of::<*const ()>(); + +/// List of all managed allocations, which we need before we can deallocate +/// any allocations again. +/// +/// While this isn't directly necessary, we need to know the layout of each +/// allocation, so that we can pass it to [`dealloc`]. This does limit +/// the time complexity of the garbage collector to at least `O(n)`. +pub static ALLOCATIONS: LazyLock>> = + LazyLock::new(|| RwLock::new(HashMap::new())); + +/// Allocates a new object with the given size, in bytes. +/// +/// The memory block created from the function is managed by the +/// runtime, allowing the garbage collector to deallocate it if it +/// determines that is is no longer in use. +pub(crate) fn allocate_object(size: u64) -> *mut u8 { + let layout = Layout::from_size_align(size as usize, POINTER_ALIGNMENT).unwrap(); + let ptr = unsafe { alloc(layout) }; + + ALLOCATIONS + .try_write() + .unwrap() + .insert(FunctionPtr::new(ptr), layout); + + ptr +} + +/// Triggers a garbage collection at the first applicable frame. If +/// no viable frame is found, returns without collecting. +/// +/// This will inspect the current stack maps to find live objects +/// and deallocate any allocations which don't exist in the stack maps. +pub(crate) fn trigger_collection() { + let Some(frame) = crate::frame::find_current_stack_map() else { + return; + }; + + let allocations = ALLOCATIONS.try_read().unwrap(); + let live_objects = frame.stack_value_locations().collect::>(); + + // Find all the allocations which don't exist in the stack maps - i.e. + // all the objects which are unreferenced / dead. + let dead_objects = allocations + .iter() + .filter(|(alloc_ptr, _)| { + !live_objects + .iter() + .any(|(_, live_ptr)| alloc_ptr.ptr() == *live_ptr) + }) + .collect::>(); + + for (_stack_ptr, _obj_ptr) in live_objects { + // If you want to implement a compacting- or generational garbage collector, + // you can move the allocation, then write the new pointer to the `stack_ptr` + // pointer. + } + + for (obj_ptr, layout) in dead_objects { + unsafe { + dealloc(obj_ptr.ptr().cast_mut(), *layout); + } + } +} diff --git a/cranelift/jit/examples/collector/main.rs b/cranelift/jit/examples/collector/main.rs new file mode 100644 index 000000000000..8ec060d484ec --- /dev/null +++ b/cranelift/jit/examples/collector/main.rs @@ -0,0 +1,345 @@ +pub(crate) mod arch; +pub(crate) mod frame; +pub(crate) mod gc; + +use std::collections::HashMap; +use std::mem; + +use cranelift::prelude::*; +use cranelift_codegen::{Context, ir::BlockArg}; +use cranelift_jit::{JITBuilder, JITModule}; +use cranelift_module::{FuncId, Linkage, Module}; + +use crate::frame::*; + +/// Intermediate metadata entry for a single function. +#[derive(Debug, Clone)] +struct FunctionMetadata { + pub total_size: usize, + pub stack_locations: FunctionStackMap, +} + +fn main() { + let mut settings = settings::builder(); + settings.set("preserve_frame_pointers", "true").unwrap(); + + let flags = settings::Flags::new(settings); + let isa = cranelift_native::builder() + .unwrap() + .finish(flags.clone()) + .unwrap(); + + let mut builder = JITBuilder::with_isa(isa, cranelift_module::default_libcall_names()); + builder.symbol("gc_alloc", gc::allocate_object as *const u8); + builder.symbol("gc_collect", gc::trigger_collection as *const u8); + + builder.symbol("trampoline_enter", frame::push_frame_entry as *const u8); + builder.symbol("trampoline_exit", frame::pop_frame_entry as *const u8); + + let mut module = JITModule::new(builder); + let mut ctx = module.make_context(); + let mut func_ctx = FunctionBuilderContext::new(); + + let trampoline_enter = { + let mut sig = module.make_signature(); + sig.params.push(AbiParam::new(types::I64)); + sig.params.push(AbiParam::new(types::I64)); + + module + .declare_function("trampoline_enter", Linkage::Import, &sig) + .unwrap() + }; + + let trampoline_exit = { + let sig = module.make_signature(); + module + .declare_function("trampoline_exit", Linkage::Import, &sig) + .unwrap() + }; + + // `gc_alloc` is meant to be used whenever a runtime-managed allocation + // is needed. For unmanaged allocations, used `malloc` or similar function. + let allocation_func = { + let mut sig = module.make_signature(); + sig.params.push(AbiParam::new(types::I64)); + sig.returns.push(AbiParam::new(types::I64)); + + let func = module + .declare_function("gc_alloc", Linkage::Import, &sig) + .unwrap(); + + create_trampoline_for( + &mut module, + &mut ctx, + &mut func_ctx, + trampoline_enter, + trampoline_exit, + func, + "gc_alloc", + &sig, + ) + }; + + // `gc_collect` is used to manually collect dead objects to reclaim + // memory. You'd likely want to insert this before all call expressions, + // with some condition before collection actually happens. + // + // For example, only run collection every 500ms or once a certain + // amount of memory is in use. + let collection_func = { + let sig = module.make_signature(); + + let func = module + .declare_function("gc_collect", Linkage::Import, &sig) + .unwrap(); + + create_trampoline_for( + &mut module, + &mut ctx, + &mut func_ctx, + trampoline_enter, + trampoline_exit, + func, + "gc_collect", + &sig, + ) + }; + + let mut function_metadata = HashMap::new(); + + // The main function is not meant to have any practical application, + // expect show an example implementation of a tracing garbage collector. + // + // The function is something akin to the following Rust code: + // ```rs + // struct Object { + // pub value: i32, + // } + // + // fn main() -> i32 { + // let a = Object { value: 8 }; + // + // let mut counter = 10; + // loop { + // let b = Object { value: 0 }; + // + // counter -= 1; + // + // if counter == 0 { + // break; + // } + // } + // + // gc_collect(); + // + // a.value + // } + // ``` + // + // After the loop has finished, `gc_collect()` will cause all the objects + // allocated within the loop to be deallocated, while the single allocation + // outside the loop will remain allocated. + let main_func = { + let mut sig = module.make_signature(); + sig.returns.push(AbiParam::new(types::I32)); + + let func = module + .declare_function("main", Linkage::Export, &sig) + .unwrap(); + + ctx.func.signature = sig.clone(); + + let mut bcx = FunctionBuilder::new(&mut ctx.func, &mut func_ctx); + let entry_block = bcx.create_block(); + bcx.append_block_params_for_function_params(entry_block); + + let loop_body = bcx.create_block(); + bcx.append_block_param(loop_body, types::I32); + bcx.append_block_param(loop_body, types::I64); + + let loop_exit = bcx.create_block(); + bcx.append_block_param(loop_exit, types::I64); + + bcx.switch_to_block(entry_block); + { + // Allocate 8 bytes for the object, in which we store an integer. + let alloc_size = bcx.ins().iconst(types::I64, 4); + let allocation_func_ref = module.declare_func_in_func(allocation_func, bcx.func); + + let call_inst = bcx.ins().call(allocation_func_ref, &[alloc_size]); + let alloc_ptr = bcx.inst_results(call_inst)[0]; + bcx.declare_value_needs_stack_map(alloc_ptr); + + let field_value = bcx.ins().iconst(types::I32, 8); + bcx.ins().store(MemFlags::new(), field_value, alloc_ptr, 0); + + let counter_value = bcx.ins().iconst(types::I32, 10); + + bcx.ins().jump( + loop_body, + vec![&BlockArg::Value(counter_value), &BlockArg::Value(alloc_ptr)], + ); + } + + bcx.switch_to_block(loop_body); + { + let parent_obj = bcx.block_params(loop_body)[1]; + bcx.declare_value_needs_stack_map(parent_obj); // required since this is a new block. + + let alloc_size = bcx.ins().iconst(types::I64, 4); + let allocation_func_ref = module.declare_func_in_func(allocation_func, bcx.func); + + let call_inst = bcx.ins().call(allocation_func_ref, &[alloc_size]); + let alloc_ptr = bcx.inst_results(call_inst)[0]; + bcx.declare_value_needs_stack_map(alloc_ptr); + + let current_count = bcx.block_params(loop_body)[0]; + let next_count = bcx.ins().iadd_imm(current_count, -1); + + let cmp_val = bcx + .ins() + .icmp_imm(IntCC::SignedGreaterThan, current_count, 0); + + bcx.ins().brif( + cmp_val, + loop_body, + vec![&BlockArg::Value(next_count), &BlockArg::Value(parent_obj)], + loop_exit, + vec![&BlockArg::Value(parent_obj)], + ); + } + + bcx.switch_to_block(loop_exit); + { + let parent_obj = bcx.block_params(loop_exit)[0]; + bcx.declare_value_needs_stack_map(parent_obj); // required since this is a new block. + + let collection_func_ref = module.declare_func_in_func(collection_func, bcx.func); + bcx.ins().call(collection_func_ref, &[]); + + let field_value = bcx.ins().load(types::I32, MemFlags::new(), parent_obj, 0); + bcx.ins().return_(&[field_value]); + } + + bcx.seal_all_blocks(); + bcx.finalize(); + + module.define_function(func, &mut ctx).unwrap(); + + let compiled_code = ctx.compiled_code().unwrap(); + let code_len = compiled_code.buffer.total_size() as usize; + + // We change the format of the stack maps, since we don't actually + // need the type of each entry in the stack map. + let mut stack_locations = Vec::new(); + for (offset, length, map) in compiled_code.buffer.user_stack_maps() { + let refs = map + .entries() + .map(|(_, offset)| offset as usize) + .collect::>(); + + stack_locations.push((*offset as usize, *length as usize, refs)); + } + + // This is an intermediate map for mapping functions to their matching stack locations, + // since we can't get them after clearing the context. + function_metadata.insert( + "main", + FunctionMetadata { + total_size: code_len, + stack_locations, + }, + ); + + module.clear_context(&mut ctx); + + create_trampoline_for( + &mut module, + &mut ctx, + &mut func_ctx, + trampoline_enter, + trampoline_exit, + func, + "main", + &sig, + ) + }; + + module.finalize_definitions().unwrap(); + + let mut func_stack_maps = Vec::new(); + + // In an implementation with dynamic codegen, this would need to be executed + // once per compiled function. Since we only have a single function, we just + // act like it's a loop. + { + let metadata = function_metadata.remove("main").unwrap(); + let start = module.get_finalized_function(main_func); + let end = unsafe { start.byte_add(metadata.total_size) }; + + func_stack_maps.push(CompiledFunctionMetadata { + start, + end, + stack_locations: metadata.stack_locations, + }); + } + + // Declare the stack maps globally, so we can use them when iterating + // through the stack frames. + declare_stack_maps(func_stack_maps); + + let main_addr = module.get_finalized_function(main_func); + let main_ptr = unsafe { mem::transmute::<_, extern "C" fn() -> i32>(main_addr) }; + let ret_code = main_ptr(); + + std::process::exit(ret_code); +} + +fn create_trampoline_for( + module: &mut JITModule, + ctx: &mut Context, + func_ctx: &mut FunctionBuilderContext, + trampoline_enter: FuncId, + trampoline_exit: FuncId, + func: FuncId, + name: &'static str, + sig: &Signature, +) -> FuncId { + let trampoline = module + .declare_function(&format!("__tp_{name}"), Linkage::Export, sig) + .unwrap(); + + ctx.func.signature = sig.clone(); + + let mut bcx = FunctionBuilder::new(&mut ctx.func, func_ctx); + let entry_block = bcx.create_block(); + bcx.append_block_params_for_function_params(entry_block); + + bcx.switch_to_block(entry_block); + + let ret_ptr = bcx.ins().get_return_address(types::I64); + + let current_fp = bcx.ins().get_frame_pointer(types::I64); + let prev_fp = bcx.ins().load(types::I64, MemFlags::new(), current_fp, 0); + + let trampoline_enter_ref = module.declare_func_in_func(trampoline_enter, &mut bcx.func); + bcx.ins().call(trampoline_enter_ref, &[prev_fp, ret_ptr]); + + let callee_ref = module.declare_func_in_func(func, &mut bcx.func); + let callee_params = bcx.block_params(entry_block).to_vec(); + let callee_call = bcx.ins().call(callee_ref, &callee_params); + let callee_return = bcx.inst_results(callee_call).to_vec(); + + let trampoline_exit_ref = module.declare_func_in_func(trampoline_exit, &mut bcx.func); + bcx.ins().call(trampoline_exit_ref, &[]); + + bcx.ins().return_(&callee_return); + + bcx.seal_all_blocks(); + bcx.finalize(); + + module.define_function(trampoline, ctx).unwrap(); + module.clear_context(ctx); + + trampoline +}