Skip to content

Commit 7592742

Browse files
committed
Add comprehensive documentation for Mesh memory allocator
- CLAUDE.md: Detailed documentation covering the Mesh paper (PLDI'19) and source code architecture - AGENTS.md, GEMINI.md: Symlinks to CLAUDE.md for alternative access - Documents compaction without relocation, meshing algorithm, shuffle vectors, and performance characteristics
1 parent d45d6de commit 7592742

File tree

3 files changed

+369
-0
lines changed

3 files changed

+369
-0
lines changed

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CLAUDE.md

CLAUDE.md

Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
# Mesh: Compacting Memory Management for C/C++ Applications
2+
3+
## Overview
4+
5+
Mesh is a groundbreaking memory allocator that performs **compaction without relocation** for unmodified C/C++ applications. Published at PLDI 2019 by Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor from the University of Massachusetts Amherst, Mesh solves the long-standing problem of memory fragmentation in unmanaged languages.
6+
7+
### Key Innovation: Compaction Without Moving Objects
8+
9+
Unlike garbage-collected languages where objects can be relocated and pointers updated, C/C++ applications expose raw memory addresses that can be manipulated arbitrarily (hidden in integers, stored to disk, used in pointer arithmetic). This makes traditional compaction impossible. Mesh's breakthrough is achieving compaction **without changing object addresses** through a technique called "meshing."
10+
11+
## Core Concepts
12+
13+
### 1. Memory Fragmentation Problem
14+
15+
#### The Challenge
16+
- **Catastrophic fragmentation**: Robson showed that all traditional allocators can suffer memory consumption up to O(log(max_size/min_size)) times the actual requirement
17+
- Example: An application allocating 16-byte and 128KB objects could consume 13× more memory than needed
18+
- Real-world impact:
19+
- 99% of Chrome crashes on low-end Android devices are due to out-of-memory conditions
20+
- Firefox underwent a 5-year effort to reduce memory footprint
21+
- Redis implements custom "active defragmentation" to combat fragmentation
22+
23+
#### Why Traditional Solutions Don't Work for C/C++
24+
- Objects cannot be safely relocated because:
25+
- Programs can stash addresses in integers
26+
- Store flags in low bits of aligned addresses
27+
- Perform arithmetic on addresses
28+
- Store addresses to disk and reload them later
29+
- There's no way to find and update all references
30+
31+
### 2. The Meshing Technique
32+
33+
Meshing consolidates the contents of multiple partially-filled pages onto a single physical page while keeping multiple virtual pages pointing to it. This is the core innovation that enables compaction without relocation.
34+
35+
#### How Meshing Works
36+
1. **Find meshable pages**: Two pages are meshable if their allocated objects don't overlap at the same offsets
37+
2. **Copy and remap**: Copy contents from one page to another, then update virtual-to-physical mappings so both virtual pages point to the same physical page
38+
3. **Release memory**: Return the now-unused physical page to the OS
39+
40+
#### Key Properties
41+
- **Virtual addresses unchanged**: Objects remain at the same virtual addresses
42+
- **Physical memory consolidated**: Multiple sparse virtual pages share dense physical pages
43+
- **Transparent to applications**: No code changes or recompilation required
44+
45+
### 3. Randomized Allocation Strategy
46+
47+
To ensure pages are likely to be meshable, Mesh uses randomized allocation instead of traditional sequential allocation.
48+
49+
#### The Problem with Sequential Allocation
50+
If objects are allocated sequentially (bump-pointer style), pages are likely to have objects at the same offsets, preventing meshing. In the worst case, every page could have exactly one object at the same offset, making meshing impossible.
51+
52+
#### Mesh's Solution: Shuffle Vectors
53+
- **Random placement**: Objects are allocated uniformly at random across available offsets in a span
54+
- **Mathematical guarantee**: Probability that all objects occupy the same offset = (1/b)^(n-1) where b = objects per span, n = number of spans
55+
- **Example**: With 64 spans of 16-byte objects in 4K pages (256 objects/page), the probability of no meshing = 10^-152
56+
57+
### 4. Efficient Meshing Search Algorithm: SplitMesher
58+
59+
Finding meshable pages efficiently is critical for runtime performance.
60+
61+
#### The Algorithm
62+
```
63+
SplitMesher(S, t):
64+
1. Split span list S into two halves: Sl and Sr
65+
2. For each span in Sl:
66+
- Probe up to t spans from Sr for meshing opportunities
67+
- If meshable pair found, mesh them and remove from lists
68+
3. Parameter t controls time/quality tradeoff (default: 64)
69+
```
70+
71+
#### Key Properties
72+
- **Probabilistic guarantees**: Finds approximation within factor of 1/2 of optimal with high probability
73+
- **Efficient runtime**: O(n/q) where n = number of spans, q = probability two spans mesh
74+
- **Practical effectiveness**: t=64 balances runtime overhead and meshing quality
75+
76+
## Architecture and Implementation
77+
78+
### System Architecture
79+
80+
Mesh is implemented as a drop-in replacement for malloc/free, requiring no source code changes. It can be used via:
81+
- **Static linking**: Compile with `-lmesh`
82+
- **Dynamic loading**: Set `LD_PRELOAD=libmesh.so` (Linux) or `DYLD_INSERT_LIBRARIES` (macOS)
83+
84+
### Core Components
85+
86+
#### 1. MiniHeaps (`mini_heap.h`)
87+
- **Purpose**: Manage physical spans of memory with metadata
88+
- **Key features**:
89+
- Allocation bitmap tracking occupied/free slots
90+
- Support for meshing multiple virtual spans to one physical span
91+
- Atomic operations for thread-safe non-local frees
92+
- Size classes from segregated-fit allocation
93+
- **States**:
94+
- **Attached**: Owned by a thread-local heap for allocation
95+
- **Detached**: Owned by global heap, available for meshing
96+
97+
#### 2. Shuffle Vectors (`shuffle_vector.h`)
98+
- **Purpose**: Enable fast randomized allocation with low overhead
99+
- **Design**: Fixed-size array of available offsets in random order
100+
- **Operations**:
101+
- **Allocation**: Pop next offset from vector (O(1))
102+
- **Deallocation**: Push freed offset and perform one Fisher-Yates shuffle iteration
103+
- **Space efficiency**:
104+
- Only 1 byte per offset (max 256 objects per span)
105+
- One shuffle vector per attached MiniHeap per thread
106+
- Total overhead: ~2.8KB per thread for 24 size classes
107+
- **Thread-local**: No synchronization needed, unlike bitmaps
108+
109+
#### 3. Thread Local Heaps (`thread_local_heap.h`)
110+
- **Purpose**: Fast, lock-free allocation for small objects
111+
- **Components**:
112+
- Shuffle vectors for each size class
113+
- Reference to global heap for refills
114+
- Thread-local PRNG for randomization
115+
- **Allocation fast path**:
116+
1. Pop from shuffle vector if available
117+
2. Refill from attached MiniHeaps if vector empty
118+
3. Request new MiniHeap from global heap if needed
119+
120+
#### 4. Global Heap (`global_heap.h`)
121+
- **Purpose**: Coordinate meshing, manage MiniHeaps, handle large allocations
122+
- **Responsibilities**:
123+
- Allocate/deallocate MiniHeaps for thread-local heaps
124+
- Perform meshing operations across all size classes
125+
- Handle non-local frees from other threads
126+
- Manage large (>16KB) allocations directly
127+
- **Meshing coordination**:
128+
- Rate-limited (default: max once per 100ms)
129+
- Skipped if last mesh freed <1MB
130+
- Concurrent with normal allocation
131+
132+
#### 5. Meshable Arena (`meshable_arena.h`)
133+
- **Purpose**: Manage virtual and physical memory for meshing
134+
- **Key innovation**: Uses file-backed shared mappings instead of anonymous memory
135+
- Created via `memfd_create()` (memory-only file)
136+
- Allows multiple virtual addresses to map same physical offset
137+
- Enables atomic page table updates via `mmap()`
138+
- **Page management**:
139+
- Tracks ownership (MiniHeapID) for each page
140+
- Maintains bins of free/used pages by size
141+
- Implements `scavenge()` to return pages to OS
142+
143+
### Meshing Implementation Details
144+
145+
#### Concurrent Meshing
146+
Meshing runs concurrently with application threads, maintaining two invariants:
147+
1. **Read correctness**: Objects being relocated are always readable
148+
2. **Write safety**: Objects are never written during relocation
149+
150+
#### Write Barrier Implementation
151+
- **Page protection**: Before copying, source pages marked read-only via `mprotect()`
152+
- **Trap handler**: Segfault handler waits for meshing to complete
153+
- **Atomic remapping**: After copy, source pages remapped read/write
154+
- **Zero stop-the-world**: No global synchronization required
155+
156+
#### Meshing Algorithm (`meshing.h`, `global_heap.cc`)
157+
```cpp
158+
// Core meshing check: do bitmaps overlap?
159+
bool bitmapsMeshable(bitmap1, bitmap2):
160+
return (bitmap1 & bitmap2) == 0
161+
162+
// Main orchestration
163+
meshAllSizeClassesLocked():
164+
1. Scavenge freed pages
165+
2. For each size class:
166+
- Flush thread-local free memory
167+
- Run shiftedSplitting algorithm
168+
- Consolidate meshable pairs
169+
3. Update statistics and page tables
170+
```
171+
172+
### Memory Layout and Size Classes
173+
174+
#### Size Classes
175+
- **Small objects**:
176+
- Same size classes as jemalloc for ≤1024 bytes
177+
- Power-of-two classes for 1024-16384 bytes
178+
- Reduces internal fragmentation from rounding
179+
- **Large objects** (>16KB):
180+
- Page-aligned, individually managed
181+
- Not considered for meshing
182+
- Directly freed to OS when deallocated
183+
184+
#### Span Management
185+
- **Span**: Contiguous run of pages containing same-sized objects
186+
- **Occupancy tracking**: Bins organized by fullness (75-99%, 50-74%, etc.)
187+
- **Random selection**: Global heap randomly selects from bins for reuse
188+
- **Meshing candidates**: Only spans with occupancy below threshold (configurable)
189+
190+
## Theoretical Analysis and Guarantees
191+
192+
### Formal Problem Definition
193+
Given n binary strings of length b (representing allocation bitmaps), find a meshing that releases the maximum number of strings. This reduces to:
194+
- **Graph representation**: Nodes = strings, edges = meshable pairs
195+
- **Optimization goal**: Minimum clique cover (partition into fewest cliques)
196+
- **Complexity**: NP-hard in general, but polynomial for constant string length
197+
198+
### Probabilistic Guarantees
199+
200+
#### Mesh Breaks Robson Bounds
201+
- **Traditional allocators**: Worst-case fragmentation of O(log(max_size/min_size))
202+
- **Mesh**: With high probability, avoids catastrophic fragmentation
203+
- **Key insight**: Meshing can redistribute memory between size classes
204+
205+
#### SplitMesher Analysis
206+
Given:
207+
- n spans to mesh
208+
- q = global probability two spans mesh
209+
- t = probe limit parameter (default: 64)
210+
211+
Results:
212+
- **Quality**: Finds ≥n(1-e^(-2k))/4 meshes with high probability where k=t×q
213+
- **Runtime**: O(n×k/q) probes in worst case
214+
- **Practical impact**: For reasonable q values, finds near-optimal meshing quickly
215+
216+
### Randomization Analysis
217+
218+
#### Why Randomization is Essential
219+
Without randomization, regular allocation patterns can prevent meshing entirely. Experiments show:
220+
- **No randomization**: Only 3% heap reduction, 4% overhead
221+
- **With randomization**: 19% heap reduction, worth the 10.7% overhead
222+
223+
#### Mathematical Properties
224+
- **Independence**: Edges in meshing graph are not fully independent (3-wise dependence)
225+
- **Triangle rarity**: P(3 strings all mesh) << P(pairwise meshing)^3
226+
- **Implication**: Can focus on finding pairs (matching) vs. larger cliques
227+
228+
## Performance Characteristics
229+
230+
### Memory Savings (from paper evaluation)
231+
232+
#### Firefox (Speedometer 2.0 benchmark)
233+
- **16% RSS reduction** vs. bundled jemalloc
234+
- 530MB with Mesh vs. 632MB with mozjemalloc
235+
- <1% performance impact on benchmark score
236+
- Consistent lower memory throughout execution
237+
238+
#### Redis (with heavy fragmentation workload)
239+
- **39% RSS reduction** automatically
240+
- Matches Redis's custom "active defragmentation" savings
241+
- 5.5× faster than Redis's built-in defragmentation
242+
- No application modifications required
243+
244+
#### SPEC CPU2006
245+
- Modest 2.4% average reduction (not allocation-intensive)
246+
- 15% reduction for perlbench (allocation-intensive)
247+
- 0.7% geometric mean runtime overhead
248+
249+
### Runtime Overhead
250+
251+
#### Meshing Costs
252+
- **Frequency**: Rate-limited to once per 100ms by default
253+
- **Duration**: Average 0.2ms, max 7.5ms (Firefox)
254+
- **Concurrent execution**: No stop-the-world pauses
255+
- **Adaptive**: Skips meshing if ineffective (<1MB freed)
256+
257+
#### Allocation Performance
258+
- **Fast path**: Thread-local, lock-free via shuffle vectors
259+
- **Random allocation**: ~2-3 additional instructions vs. bump pointer
260+
- **Cache behavior**: Comparable to modern allocators
261+
- **Scalability**: Thread-local heaps minimize contention
262+
263+
### Space Overhead
264+
265+
#### Per-Thread
266+
- **Shuffle vectors**: ~2.8KB total (24 size classes × ~120 bytes)
267+
- **Thread-local metadata**: <1KB
268+
269+
#### Global
270+
- **MiniHeap metadata**: 64 bytes per MiniHeap
271+
- **Bitmap overhead**: 1 bit per object slot
272+
- **Page tables**: Standard virtual memory overhead
273+
274+
## Configuration and Tuning
275+
276+
### Key Parameters
277+
278+
#### Meshing Parameters
279+
- **Mesh rate**: `mesh.check_period` (default: 100ms)
280+
- **Effectiveness threshold**: Minimum bytes to free (default: 1MB)
281+
- **Probe limit (t)**: Balances quality vs. runtime (default: 64)
282+
- **Max meshes per iteration**: Prevents excessive meshing time
283+
284+
#### Allocation Parameters
285+
- **Shuffle on allocation**: Always enabled for randomization
286+
- **Shuffle on free**: Optional additional randomization
287+
- **Occupancy cutoff**: Max fullness for meshing candidates (default: configurable)
288+
289+
### Usage Modes
290+
291+
#### Default Mode
292+
Full meshing with randomization enabled. Best for:
293+
- Long-running applications
294+
- Memory-constrained environments
295+
- Applications with fragmentation issues
296+
297+
#### No-Meshing Mode
298+
Randomized allocation only. Useful for:
299+
- Debugging allocation patterns
300+
- Performance comparison
301+
- Applications with regular allocation patterns
302+
303+
#### Compatibility Notes
304+
- **Transparent huge pages**: Should be disabled (conflicts with 4KB page granularity)
305+
- **Direct huge page allocation**: Still supported via mmap interfaces
306+
- **Security features**: Compatible with ASLR, DEP, etc.
307+
308+
## Limitations and Considerations
309+
310+
### When Mesh is Most Effective
311+
- **High fragmentation**: Many partially-filled pages
312+
- **Long-running applications**: Time for fragmentation to develop
313+
- **Mixed allocation sizes**: Benefits from cross-size-class redistribution
314+
- **Memory-constrained environments**: Where savings matter most
315+
316+
### When Mesh May Be Less Effective
317+
- **Sequential allocation patterns**: Without enough entropy for meshing
318+
- **Very large allocations**: >16KB objects not meshed
319+
- **Short-lived programs**: Insufficient time for meshing benefits
320+
- **Full pages**: Nothing to compact when pages are dense
321+
322+
### Overhead Considerations
323+
- **Virtual address space**: 2× consumption in worst case (unmeshed pages)
324+
- **TLB pressure**: Multiple virtual pages per physical page
325+
- **Page faults**: Initial access to meshed pages
326+
- **Memory bandwidth**: Copying during mesh operations
327+
328+
## Implementation Status
329+
330+
### Platform Support
331+
- **Linux**: Full support (primary platform)
332+
- **macOS**: Full support
333+
- **64-bit only**: Current implementation requires 64-bit address space
334+
335+
### Integration
336+
- **Drop-in replacement**: No code changes required
337+
- **Standard API**: Full malloc/free/realloc/memalign support
338+
- **Statistics**: mallctl API for runtime introspection
339+
- **Open source**: Apache 2.0 license
340+
341+
## Key Insights and Innovations
342+
343+
### Revolutionary Concepts
344+
1. **Compaction without relocation**: Previously thought impossible for C/C++
345+
2. **Virtual memory as abstraction layer**: Leverages MMU for pointer stability
346+
3. **Randomization for meshability**: Probabilistic approach to fragmentation
347+
348+
### Theoretical Contributions
349+
1. **Breaks Robson bounds**: First allocator to provably avoid worst-case fragmentation
350+
2. **Cross-size-class redistribution**: Unique ability to move memory between classes
351+
3. **Formal analysis**: Rigorous probabilistic guarantees on effectiveness
352+
353+
### Practical Impact
354+
1. **Automatic memory savings**: No application changes required
355+
2. **Compatible with existing code**: True drop-in replacement
356+
3. **Solves real problems**: Demonstrated savings in Firefox and Redis
357+
358+
## Summary
359+
360+
Mesh represents a breakthrough in memory management for unmanaged languages, achieving what was previously thought impossible: automatic compaction for C/C++ applications. Through the novel combination of meshing (virtual page remapping), randomized allocation, and efficient mesh search algorithms, Mesh provides:
361+
362+
- **Significant memory savings**: 16-39% reduction in real applications
363+
- **Theoretical guarantees**: Breaks classical fragmentation bounds
364+
- **Practical deployment**: Drop-in replacement requiring no code changes
365+
- **Low overhead**: <1% performance impact in most cases
366+
367+
The key innovation—compaction without relocation—opens new possibilities for memory management in systems programming, potentially influencing future language runtimes and operating system designs.

GEMINI.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
CLAUDE.md

0 commit comments

Comments
 (0)