Arena-based allocation in the current BPlusTreeMap implementation creates 1.68x iteration overhead compared to Rust's standard BTreeMap. This analysis examines the fundamental challenges of eliminating arena allocation while maintaining Rust's memory safety guarantees, and evaluates alternative approaches including Box-based allocation, Rc/RefCell, unsafe pointers, and generational indices.
- Iteration overhead: 35.61 ns per item vs BTreeMap
- Memory overhead: 112 bytes struct size vs 24 bytes for BTreeMap
- Cache behavior: 7.08x slower for small ranges due to indirection
- Lookup performance: Actually 5% faster than BTreeMap for random access
pub struct BPlusTreeMap<K, V> {
capacity: usize,
root: NodeRef<K, V>,
leaf_arena: Arena<LeafNode<K, V>>, // Separate arena for leaves
branch_arena: Arena<BranchNode<K, V>>, // Separate arena for branches
}
pub enum NodeRef<K, V> {
Leaf(NodeId, PhantomData<(K, V)>), // NodeId = u32 index
Branch(NodeId, PhantomData<(K, V)>),
}Every node access requires:
- Convert
NodeId(u32) tousize - Index into
Vec<Option<T>> - Unwrap
Optionto access actual node - Potential cache miss from non-contiguous storage
pub struct ItemIterator<'a, K, V> {
tree: &'a BPlusTreeMap<K, V>,
current_leaf_id: Option<NodeId>, // Requires arena lookup
current_leaf_index: usize,
// ... additional state
}Each next() call involves arena access + linked list traversal vs BTreeMap's direct pointer chasing.
- Arena slots can become fragmented after deletions
Vec<Option<T>>wastes memory onNonevalues- Cannot shrink arena without invalidating existing NodeIds
pub enum Node<K, V> {
Leaf(Box<LeafNode<K, V>>),
Branch(Box<BranchNode<K, V>>),
}
pub struct LeafNode<K, V> {
keys: Vec<K>,
values: Vec<V>,
next: Option<Box<LeafNode<K, V>>>, // Direct pointer instead of NodeId
}- Zero indirection: Direct heap pointers
- Optimal cache behavior: Each node is contiguous in memory
- Automatic memory management: Drop trait handles cleanup
- Smaller memory footprint: No arena overhead
- Borrowing conflicts: Cannot hold mutable reference to parent while accessing child
- Self-referential structures: Rust's ownership prevents cycles
- Split operations: Difficult to return new nodes while maintaining tree structure
- Iterator invalidation: Mutable operations can invalidate iterators
// This fails to compile:
fn split_leaf(&mut self, leaf: &mut LeafNode<K, V>) -> Box<LeafNode<K, V>> {
let new_leaf = leaf.split(); // Needs &mut self for allocation
self.update_parent_pointers(); // Borrowing conflict!
new_leaf
}Impractical - Rust's borrowing rules make tree mutations extremely difficult without unsafe code.
type NodePtr<K, V> = Rc<RefCell<Node<K, V>>>;
pub struct BPlusTreeMap<K, V> {
root: NodePtr<K, V>,
}
pub enum Node<K, V> {
Leaf {
keys: Vec<K>,
values: Vec<V>,
next: Option<NodePtr<K, V>>,
},
Branch {
keys: Vec<K>,
children: Vec<NodePtr<K, V>>,
},
}- Shared ownership: Multiple references to same node
- Interior mutability: Can mutate through shared references
- Reference cycles: Supports parent-child relationships
- Familiar patterns: Similar to other languages' approaches
- Runtime borrow checking:
RefCellpanics on borrow violations - Performance overhead: Reference counting + runtime checks
- Memory leaks: Potential cycles prevent automatic cleanup
- Complex error handling: Runtime panics vs compile-time safety
// Each node access requires:
let node = node_ptr.borrow(); // Runtime borrow check
match &*node { // Deref + pattern match
Node::Leaf { keys, .. } => { /* access */ }
}
// Automatic drop of borrow guardEstimated overhead: 20-40% slower than arena due to:
- Reference counting operations
- Runtime borrow checking
- Additional indirection through RefCell
Possible but suboptimal - Trades compile-time safety for runtime overhead and complexity.
pub struct BPlusTreeMap<K, V> {
root: *mut Node<K, V>,
_phantom: PhantomData<(K, V)>,
}
pub enum Node<K, V> {
Leaf {
keys: Vec<K>,
values: Vec<V>,
next: *mut Node<K, V>, // Raw pointer
},
Branch {
keys: Vec<K>,
children: Vec<*mut Node<K, V>>,
},
}- Maximum performance: Direct pointer access, no overhead
- Full control: Can implement any tree operation
- Memory efficiency: Minimal memory overhead
- Flexibility: Can optimize for specific use cases
- Memory safety: Manual memory management required
- Use-after-free: Dangling pointers after node deletion
- Double-free: Potential double deletion bugs
- Iterator safety: Iterators can become invalid
- Maintenance burden: Complex unsafe code is hard to verify
unsafe impl<K, V> Send for BPlusTreeMap<K, V>
where K: Send, V: Send {}
unsafe impl<K, V> Sync for BPlusTreeMap<K, V>
where K: Sync, V: Sync {}
impl<K, V> Drop for BPlusTreeMap<K, V> {
fn drop(&mut self) {
unsafe {
// Must manually traverse and free all nodes
self.free_subtree(self.root);
}
}
}High-performance but risky - Requires extensive unsafe code and careful verification. Only suitable for performance-critical applications with expert developers.
use slotmap::{SlotMap, DefaultKey};
pub struct BPlusTreeMap<K, V> {
nodes: SlotMap<DefaultKey, Node<K, V>>,
root: DefaultKey,
}
pub enum Node<K, V> {
Leaf {
keys: Vec<K>,
values: Vec<V>,
next: Option<DefaultKey>, // Generational index
},
Branch {
keys: Vec<K>,
children: Vec<DefaultKey>,
},
}- Memory safety: Automatic detection of stale references
- ABA problem solved: Generational versioning prevents reuse issues
- Stable references: Keys remain valid across operations
- Efficient storage: Packed storage with O(1) access
- Mature implementation: Well-tested SlotMap crate
- Similar overhead to arena: Still requires indirection
- External dependency: Adds crate dependency
- Key size: 64-bit keys vs 32-bit NodeIds
- Limited improvement: May not solve core performance issues
// Arena access:
let node = self.leaf_arena.get(node_id)?; // Vec index + Option unwrap
// SlotMap access:
let node = self.nodes.get(key)?; // Similar Vec index + generation checkExpected performance: Similar to current arena implementation, possibly 5-10% slower due to generation checking.
Incremental improvement - Provides better safety guarantees but doesn't address fundamental iteration performance issues.
pub struct BPlusTreeMap<K, V> {
root: Box<Node<K, V>>,
// Keep arena for temporary storage during splits
temp_arena: Arena<Node<K, V>>,
}Use Box for normal tree structure, arena only during complex operations.
pub struct BPlusTreeMap<K, V> {
inner: UnsafeTree<K, V>, // Raw pointers internally
}
impl<K, V> BPlusTreeMap<K, V> {
pub fn get(&self, key: &K) -> Option<&V> {
// Safe wrapper around unsafe implementation
unsafe { self.inner.get(key) }
}
}Encapsulate unsafe implementation behind safe API.
pub enum Node<K, V> {
Owned(Box<NodeData<K, V>>),
Borrowed(&'static NodeData<K, V>), // For read-heavy workloads
}Optimize for read-heavy scenarios with immutable sharing.
Based on analysis and benchmarking:
| Approach | Iteration Speed | Memory Usage | Safety | Complexity |
|---|---|---|---|---|
| Current Arena | 1.68x slower | High | Safe | Medium |
| Box-based | ~1.0x (ideal) | Low | Compile issues | High |
| Rc/RefCell | 1.3-1.5x slower | Medium | Runtime panics | Medium |
| Unsafe pointers | 0.8-1.0x | Minimal | Manual | Very High |
| SlotMap | 1.6-1.8x slower | Medium | Safe | Low |
-
Arena optimization:
- Use
Vec<T>instead ofVec<Option<T>>with separate free list - Implement arena compaction to improve cache locality
- Pre-allocate arena capacity based on expected tree size
- Use
-
Iterator optimization:
- Cache leaf node references to reduce arena lookups
- Implement iterator pooling to reduce allocation overhead
- Add fast-path for sequential iteration
- Hybrid approach: Use Box for leaf nodes (better iteration), arena for branch nodes (easier mutations)
- Specialized iterators: Different iterator implementations for different use cases
- Memory layout optimization: Pack related nodes together in memory
- Unsafe core with safe wrapper: Maximum performance with safety guarantees
- Pluggable allocation strategies: Allow users to choose allocation method
- SIMD optimization: Vectorized operations for large-scale iteration
Eliminating arena-based allocation in Rust B+ trees faces fundamental challenges due to Rust's ownership system. While alternatives exist, each involves significant trade-offs:
- Box-based allocation is theoretically optimal but practically impossible due to borrowing conflicts
- Rc/RefCell provides flexibility but adds runtime overhead and complexity
- Unsafe pointers offer maximum performance but require extensive verification
- Generational indices improve safety but don't address core performance issues
The most practical approach is incremental optimization of the existing arena system combined with specialized optimizations for iteration-heavy workloads. For applications requiring maximum performance, a carefully designed unsafe core with safe wrappers may be justified, but this requires significant development and verification effort.
The current arena-based approach, while not optimal for iteration, provides a good balance of safety, performance, and maintainability for most use cases. The 1.68x iteration overhead is acceptable given the benefits in insertion/deletion performance and memory safety guarantees.