Skip to content

Commit 363cd2d

Browse files
authored
Expose memory-related options in Config (#1513)
* Expose memory-related options in `Config` This commit was initially motivated by looking more into #1501, but it ended up balooning a bit after finding a few issues. The high-level items in this commit are: * New configuration options via `wasmtime::Config` are exposed to configure the tunable limits of how memories are allocated and such. * The `MemoryCreator` trait has been updated to accurately reflect the required allocation characteristics that JIT code expects. * A bug has been fixed in the cranelift wasm code generation where if no guard page was present bounds checks weren't accurately performed. The new `Config` methods allow tuning the memory allocation characteristics of wasmtime. Currently 64-bit platforms will reserve 6GB chunks of memory for each linear memory, but by tweaking various config options you can change how this is allocate, perhaps at the cost of slower JIT code since it needs more bounds checks. The methods are intended to be pretty thoroughly documented as to the effect they have on the JIT code and what values you may wish to select. These new methods have been added to the spectest fuzzer to ensure that various configuration values for these methods don't affect correctness. The `MemoryCreator` trait previously only allocated memories with a `MemoryType`, but this didn't actually reflect the guarantees that JIT code expected. JIT code is generated with an assumption about the minimum size of the guard region, as well as whether memory is static or dynamic (whether the base pointer can be relocated). These properties must be upheld by custom allocation engines for JIT code to perform correctly, so extra parameters have been added to `MemoryCreator::new_memory` to reflect this. Finally the fuzzing with `Config` turned up an issue where if no guard pages present the wasm code wouldn't correctly bounds-check memory accesses. The issue here was that with a guard page we only need to bounds-check the first byte of access, but without a guard page we need to bounds-check the last byte of access. This meant that the code generation needed to account for the size of the memory operation (load/store) and use this as the offset-to-check in the no-guard-page scenario. I've attempted to make the various comments in cranelift a bit more exhaustive too to hopefully make it a bit clearer for future readers! Closes #1501 * Review comments * Update a comment
1 parent bc4b470 commit 363cd2d

File tree

11 files changed

+430
-52
lines changed

11 files changed

+430
-52
lines changed

cranelift/codegen/src/legalizer/heap.rs

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,9 @@ fn static_addr(
114114
let mut pos = FuncCursor::new(func).at_inst(inst);
115115
pos.use_srcloc(inst);
116116

117-
// Start with the bounds check. Trap if `offset + access_size > bound`.
117+
// The goal here is to trap if `offset + access_size > bound`.
118+
//
119+
// This first case is a trivial case where we can easily trap.
118120
if access_size > bound {
119121
// This will simply always trap since `offset >= 0`.
120122
pos.ins().trap(ir::TrapCode::HeapOutOfBounds);
@@ -129,11 +131,21 @@ fn static_addr(
129131
return;
130132
}
131133

132-
// Check `offset > limit` which is now known non-negative.
134+
// After the trivial case is done we're now mostly interested in trapping
135+
// if `offset > bound - access_size`. We know `bound - access_size` here is
136+
// non-negative from the above comparison.
137+
//
138+
// If we can know `bound - access_size >= 4GB` then with a 32-bit offset
139+
// we're guaranteed:
140+
//
141+
// bound - access_size >= 4GB > offset
142+
//
143+
// or, in other words, `offset < bound - access_size`, meaning we can't trap
144+
// for any value of `offset`.
145+
//
146+
// With that we have an optimization here where with 32-bit offsets and
147+
// `bound - access_size >= 4GB` we can omit a bounds check.
133148
let limit = bound - access_size;
134-
135-
// We may be able to omit the check entirely for 32-bit offsets if the heap bound is 4 GB or
136-
// more.
137149
if offset_ty != ir::types::I32 || limit < 0xffff_ffff {
138150
let oob = if limit & 1 == 1 {
139151
// Prefer testing `offset >= limit - 1` when limit is odd because an even number is

cranelift/wasm/src/code_translator.rs

Lines changed: 101 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ use cranelift_codegen::ir::{
3939
};
4040
use cranelift_codegen::packed_option::ReservedValue;
4141
use cranelift_frontend::{FunctionBuilder, Variable};
42+
use std::cmp;
43+
use std::convert::TryFrom;
4244
use std::vec::Vec;
4345
use wasmparser::{MemoryImmediate, Operator};
4446

@@ -655,42 +657,42 @@ pub fn translate_operator<FE: FuncEnvironment + ?Sized>(
655657
Operator::I16x8Load8x8S {
656658
memarg: MemoryImmediate { flags: _, offset },
657659
} => {
658-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
660+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
659661
let loaded = builder.ins().sload8x8(flags, base, offset);
660662
state.push1(loaded);
661663
}
662664
Operator::I16x8Load8x8U {
663665
memarg: MemoryImmediate { flags: _, offset },
664666
} => {
665-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
667+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
666668
let loaded = builder.ins().uload8x8(flags, base, offset);
667669
state.push1(loaded);
668670
}
669671
Operator::I32x4Load16x4S {
670672
memarg: MemoryImmediate { flags: _, offset },
671673
} => {
672-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
674+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
673675
let loaded = builder.ins().sload16x4(flags, base, offset);
674676
state.push1(loaded);
675677
}
676678
Operator::I32x4Load16x4U {
677679
memarg: MemoryImmediate { flags: _, offset },
678680
} => {
679-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
681+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
680682
let loaded = builder.ins().uload16x4(flags, base, offset);
681683
state.push1(loaded);
682684
}
683685
Operator::I64x2Load32x2S {
684686
memarg: MemoryImmediate { flags: _, offset },
685687
} => {
686-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
688+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
687689
let loaded = builder.ins().sload32x2(flags, base, offset);
688690
state.push1(loaded);
689691
}
690692
Operator::I64x2Load32x2U {
691693
memarg: MemoryImmediate { flags: _, offset },
692694
} => {
693-
let (flags, base, offset) = prepare_load(*offset, builder, state, environ)?;
695+
let (flags, base, offset) = prepare_load(*offset, 8, builder, state, environ)?;
694696
let loaded = builder.ins().uload32x2(flags, base, offset);
695697
state.push1(loaded);
696698
}
@@ -1701,25 +1703,70 @@ fn get_heap_addr(
17011703
heap: ir::Heap,
17021704
addr32: ir::Value,
17031705
offset: u32,
1706+
width: u32,
17041707
addr_ty: Type,
17051708
builder: &mut FunctionBuilder,
17061709
) -> (ir::Value, i32) {
1707-
use core::cmp::min;
1708-
1709-
let mut adjusted_offset = u64::from(offset);
17101710
let offset_guard_size: u64 = builder.func.heaps[heap].offset_guard_size.into();
17111711

1712-
// Generate `heap_addr` instructions that are friendly to CSE by checking offsets that are
1713-
// multiples of the offset-guard size. Add one to make sure that we check the pointer itself
1714-
// is in bounds.
1715-
if offset_guard_size != 0 {
1716-
adjusted_offset = adjusted_offset / offset_guard_size * offset_guard_size;
1717-
}
1718-
1719-
// For accesses on the outer skirts of the offset-guard pages, we expect that we get a trap
1720-
// even if the access goes beyond the offset-guard pages. This is because the first byte
1721-
// pointed to is inside the offset-guard pages.
1722-
let check_size = min(u64::from(u32::MAX), 1 + adjusted_offset) as u32;
1712+
// How exactly the bounds check is performed here and what it's performed
1713+
// on is a bit tricky. Generally we want to rely on access violations (e.g.
1714+
// segfaults) to generate traps since that means we don't have to bounds
1715+
// check anything explicitly.
1716+
//
1717+
// If we don't have a guard page of unmapped memory, though, then we can't
1718+
// rely on this trapping behavior through segfaults. Instead we need to
1719+
// bounds-check the entire memory access here which is everything from
1720+
// `addr32 + offset` to `addr32 + offset + width` (not inclusive). In this
1721+
// scenario our adjusted offset that we're checking is `offset + width`.
1722+
//
1723+
// If we have a guard page, however, then we can perform a further
1724+
// optimization of the generated code by only checking multiples of the
1725+
// offset-guard size to be more CSE-friendly. Knowing that we have at least
1726+
// 1 page of a guard page we're then able to disregard the `width` since we
1727+
// know it's always less than one page. Our bounds check will be for the
1728+
// first byte which will either succeed and be guaranteed to fault if it's
1729+
// actually out of bounds, or the bounds check itself will fail. In any case
1730+
// we assert that the width is reasonably small for now so this assumption
1731+
// can be adjusted in the future if we get larger widths.
1732+
//
1733+
// Put another way we can say, where `y < offset_guard_size`:
1734+
//
1735+
// n * offset_guard_size + y = offset
1736+
//
1737+
// We'll then pass `n * offset_guard_size` as the bounds check value. If
1738+
// this traps then our `offset` would have trapped anyway. If this check
1739+
// passes we know
1740+
//
1741+
// addr32 + n * offset_guard_size < bound
1742+
//
1743+
// which means
1744+
//
1745+
// addr32 + n * offset_guard_size + y < bound + offset_guard_size
1746+
//
1747+
// because `y < offset_guard_size`, which then means:
1748+
//
1749+
// addr32 + offset < bound + offset_guard_size
1750+
//
1751+
// Since we know that that guard size bytes are all unmapped we're
1752+
// guaranteed that `offset` and the `width` bytes after it are either
1753+
// in-bounds or will hit the guard page, meaning we'll get the desired
1754+
// semantics we want.
1755+
//
1756+
// As one final comment on the bits with the guard size here, another goal
1757+
// of this is to hit an optimization in `heap_addr` where if the heap size
1758+
// minus the offset is >= 4GB then bounds checks are 100% eliminated. This
1759+
// means that with huge guard regions (e.g. our 2GB default) most adjusted
1760+
// offsets we're checking here are zero. This means that we'll hit the fast
1761+
// path and emit zero conditional traps for bounds checks
1762+
let adjusted_offset = if offset_guard_size == 0 {
1763+
u64::from(offset) + u64::from(width)
1764+
} else {
1765+
assert!(width < 1024);
1766+
cmp::max(u64::from(offset) / offset_guard_size * offset_guard_size, 1)
1767+
};
1768+
debug_assert!(adjusted_offset > 0); // want to bounds check at least 1 byte
1769+
let check_size = u32::try_from(adjusted_offset).unwrap_or(u32::MAX);
17231770
let base = builder.ins().heap_addr(addr_ty, heap, addr32, check_size);
17241771

17251772
// Native load/store instructions take a signed `Offset32` immediate, so adjust the base
@@ -1736,6 +1783,7 @@ fn get_heap_addr(
17361783
/// Prepare for a load; factors out common functionality between load and load_extend operations.
17371784
fn prepare_load<FE: FuncEnvironment + ?Sized>(
17381785
offset: u32,
1786+
loaded_bytes: u32,
17391787
builder: &mut FunctionBuilder,
17401788
state: &mut FuncTranslationState,
17411789
environ: &mut FE,
@@ -1744,7 +1792,14 @@ fn prepare_load<FE: FuncEnvironment + ?Sized>(
17441792

17451793
// We don't yet support multiple linear memories.
17461794
let heap = state.get_heap(builder.func, 0, environ)?;
1747-
let (base, offset) = get_heap_addr(heap, addr32, offset, environ.pointer_type(), builder);
1795+
let (base, offset) = get_heap_addr(
1796+
heap,
1797+
addr32,
1798+
offset,
1799+
loaded_bytes,
1800+
environ.pointer_type(),
1801+
builder,
1802+
);
17481803

17491804
// Note that we don't set `is_aligned` here, even if the load instruction's
17501805
// alignment immediate says it's aligned, because WebAssembly's immediate
@@ -1763,7 +1818,13 @@ fn translate_load<FE: FuncEnvironment + ?Sized>(
17631818
state: &mut FuncTranslationState,
17641819
environ: &mut FE,
17651820
) -> WasmResult<()> {
1766-
let (flags, base, offset) = prepare_load(offset, builder, state, environ)?;
1821+
let (flags, base, offset) = prepare_load(
1822+
offset,
1823+
mem_op_size(opcode, result_ty),
1824+
builder,
1825+
state,
1826+
environ,
1827+
)?;
17671828
let (load, dfg) = builder.ins().Load(opcode, result_ty, flags, offset, base);
17681829
state.push1(dfg.first_result(load));
17691830
Ok(())
@@ -1782,7 +1843,14 @@ fn translate_store<FE: FuncEnvironment + ?Sized>(
17821843

17831844
// We don't yet support multiple linear memories.
17841845
let heap = state.get_heap(builder.func, 0, environ)?;
1785-
let (base, offset) = get_heap_addr(heap, addr32, offset, environ.pointer_type(), builder);
1846+
let (base, offset) = get_heap_addr(
1847+
heap,
1848+
addr32,
1849+
offset,
1850+
mem_op_size(opcode, val_ty),
1851+
environ.pointer_type(),
1852+
builder,
1853+
);
17861854
// See the comments in `translate_load` about the flags.
17871855
let flags = MemFlags::new();
17881856
builder
@@ -1791,6 +1859,16 @@ fn translate_store<FE: FuncEnvironment + ?Sized>(
17911859
Ok(())
17921860
}
17931861

1862+
fn mem_op_size(opcode: ir::Opcode, ty: Type) -> u32 {
1863+
match opcode {
1864+
ir::Opcode::Istore8 | ir::Opcode::Sload8 | ir::Opcode::Uload8 => 1,
1865+
ir::Opcode::Istore16 | ir::Opcode::Sload16 | ir::Opcode::Uload16 => 2,
1866+
ir::Opcode::Istore32 | ir::Opcode::Sload32 | ir::Opcode::Uload32 => 4,
1867+
ir::Opcode::Store | ir::Opcode::Load => ty.bytes(),
1868+
_ => panic!("unknown size of mem op for {:?}", opcode),
1869+
}
1870+
}
1871+
17941872
fn translate_icmp(cc: IntCC, builder: &mut FunctionBuilder, state: &mut FuncTranslationState) {
17951873
let (arg0, arg1) = state.pop2();
17961874
let val = builder.ins().icmp(cc, arg0, arg1);

crates/api/src/externals.rs

Lines changed: 53 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -870,8 +870,59 @@ pub unsafe trait LinearMemory {
870870
/// Note that this is a relatively new and experimental feature and it is recommended
871871
/// to be familiar with wasmtime runtime code to use it.
872872
pub unsafe trait MemoryCreator: Send + Sync {
873-
/// Create new LinearMemory
874-
fn new_memory(&self, ty: MemoryType) -> Result<Box<dyn LinearMemory>, String>;
873+
/// Create a new `LinearMemory` object from the specified parameters.
874+
///
875+
/// The type of memory being created is specified by `ty` which indicates
876+
/// both the minimum and maximum size, in wasm pages.
877+
///
878+
/// The `reserved_size` value indicates the expected size of the
879+
/// reservation that is to be made for this memory. If this value is `None`
880+
/// than the implementation is free to allocate memory as it sees fit. If
881+
/// the value is `Some`, however, then the implementation is expected to
882+
/// reserve that many bytes for the memory's allocation, plus the guard
883+
/// size at the end. Note that this reservation need only be a virtual
884+
/// memory reservation, physical memory does not need to be allocated
885+
/// immediately. In this case `grow` should never move the base pointer and
886+
/// the maximum size of `ty` is guaranteed to fit within `reserved_size`.
887+
///
888+
/// The `guard_size` parameter indicates how many bytes of space, after the
889+
/// memory allocation, is expected to be unmapped. JIT code will elide
890+
/// bounds checks based on the `guard_size` provided, so for JIT code to
891+
/// work correctly the memory returned will need to be properly guarded with
892+
/// `guard_size` bytes left unmapped after the base allocation.
893+
///
894+
/// Note that the `reserved_size` and `guard_size` options are tuned from
895+
/// the various [`Config`](crate::Config) methods about memory
896+
/// sizes/guards. Additionally these two values are guaranteed to be
897+
/// multiples of the system page size.
898+
fn new_memory(
899+
&self,
900+
ty: MemoryType,
901+
reserved_size: Option<u64>,
902+
guard_size: u64,
903+
) -> Result<Box<dyn LinearMemory>, String>;
904+
}
905+
906+
#[cfg(test)]
907+
mod tests {
908+
use crate::*;
909+
910+
// Assert that creating a memory via `Memory::new` respects the limits/tunables
911+
// in `Config`.
912+
#[test]
913+
fn respect_tunables() {
914+
let mut cfg = Config::new();
915+
cfg.static_memory_maximum_size(0)
916+
.dynamic_memory_guard_size(0);
917+
let store = Store::new(&Engine::new(&cfg));
918+
let ty = MemoryType::new(Limits::new(1, None));
919+
let mem = Memory::new(&store, ty);
920+
assert_eq!(mem.wasmtime_export.memory.offset_guard_size, 0);
921+
match mem.wasmtime_export.memory.style {
922+
wasmtime_environ::MemoryStyle::Dynamic => {}
923+
other => panic!("unexpected style {:?}", other),
924+
}
925+
}
875926
}
876927

877928
// Exports

0 commit comments

Comments
 (0)