|
| 1 | +# eBPF Tutorial by Example: BPF Arena for Zero-Copy Shared Memory |
| 2 | + |
| 3 | +Ever tried building a linked list in eBPF and got stuck using awkward integer indices instead of real pointers? Or needed to share large amounts of data between your kernel BPF program and userspace without expensive syscalls? Traditional BPF maps force you to work around pointer limitations and require system calls for every access. What if you could just use normal C pointers and have direct memory access from both kernel and userspace? |
| 4 | + |
| 5 | +This is what **BPF Arena** solves. Created by Alexei Starovoitov in 2024, arena provides a sparse shared memory region where BPF programs can use real pointers to build complex data structures like linked lists, trees, and graphs, while userspace gets zero-copy direct access to the same memory. In this tutorial, we'll build a linked list in arena memory and show you how both kernel and userspace can manipulate it using standard pointer operations. |
| 6 | + |
| 7 | +## Introduction to BPF Arena: Breaking Free from Map Limitations |
| 8 | + |
| 9 | +### The Problem: When BPF Maps Aren't Enough |
| 10 | + |
| 11 | +Traditional BPF maps are fantastic for simple key-value storage, but they have fundamental limitations when you need complex data structures or large-scale data sharing. Let's look at what developers faced before arena existed. |
| 12 | + |
| 13 | +**Ring buffers** only work in one direction - BPF can send data to userspace, but userspace can't write back. They're streaming-only, no random access. **Hash and array maps** require syscalls like `bpf_map_lookup_elem()` for every access from userspace. Array maps allocate all their memory upfront, wasting space if you only use a fraction of entries. Most critically, **you can't use real pointers** - you're forced to use integer indices to link data structures together. |
| 14 | + |
| 15 | +Building a linked list the old way looked like this mess: |
| 16 | + |
| 17 | +```c |
| 18 | +struct node { |
| 19 | + int next_idx; // Can't use pointers, must use index! |
| 20 | + int data; |
| 21 | +}; |
| 22 | + |
| 23 | +struct { |
| 24 | + __uint(type, BPF_MAP_TYPE_ARRAY); |
| 25 | + __uint(max_entries, 10000); |
| 26 | + __type(value, struct node); |
| 27 | +} nodes_map SEC(".maps"); |
| 28 | + |
| 29 | +// Traverse requires repeated map lookups |
| 30 | +int idx = head_idx; |
| 31 | +while (idx != -1) { |
| 32 | + struct node *n = bpf_map_lookup_elem(&nodes_map, &idx); |
| 33 | + if (!n) break; |
| 34 | + process(n->data); |
| 35 | + idx = n->next_idx; // No pointer following! |
| 36 | +} |
| 37 | +``` |
| 38 | +
|
| 39 | +Every node access requires a map lookup. You can't just follow pointers like normal C code. The verifier won't let you use pointers across different map entries. This makes implementing trees, graphs, or any pointer-based structure incredibly awkward and slow. |
| 40 | +
|
| 41 | +### The Solution: Sparse Shared Memory with Real Pointers |
| 42 | +
|
| 43 | +In 2024, Alexei Starovoitov from the Linux kernel team introduced BPF arena to solve these limitations. Arena provides a **sparse shared memory region** between BPF programs and userspace, supporting up to 4GB of address space. Memory pages are allocated on-demand as you use them, so you don't waste space. Both kernel BPF code and userspace programs can map the same arena and access it directly. |
| 44 | +
|
| 45 | +The game-changer: you can use **real C pointers** in BPF programs targeting arena memory. The `__arena` annotation tells the verifier that these pointers reference arena space, and special address space casts (`cast_kern()`, `cast_user()`) let you safely convert between kernel and userspace views of the same memory. Userspace gets zero-copy access through `mmap()` - no syscalls needed to read or write arena data. |
| 46 | +
|
| 47 | +Here's what the same linked list looks like with arena: |
| 48 | +
|
| 49 | +```c |
| 50 | +struct node __arena { |
| 51 | + struct node __arena *next; // Real pointer! |
| 52 | + int data; |
| 53 | +}; |
| 54 | +
|
| 55 | +struct node __arena *head; |
| 56 | +
|
| 57 | +// Traverse with normal pointer following |
| 58 | +struct node __arena *n = head; |
| 59 | +while (n) { |
| 60 | + process(n->data); |
| 61 | + n = n->next; // Just follow the pointer! |
| 62 | +} |
| 63 | +``` |
| 64 | + |
| 65 | +Clean, simple, exactly how you'd write it in normal C. The verifier understands arena pointers and lets you dereference them safely. |
| 66 | + |
| 67 | +### Why This Matters |
| 68 | + |
| 69 | +Arena was inspired by research showing the potential for complex data structures in BPF. Before arena, developers were building hash tables, queues, and trees using giant BPF array maps with integer indices instead of pointers. It worked, but the code was ugly and slow. Arena unlocks several powerful use cases. |
| 70 | + |
| 71 | +**In-kernel data structures** become practical. You can implement custom hash tables with collision chaining, AVL or red-black trees for sorted data, graphs for network topology mapping, all using normal pointer operations. **Key-value store accelerators** can run in the kernel for maximum performance, with userspace getting direct access to the data structure without syscall overhead. **Bidirectional communication** works naturally - both kernel and userspace can modify shared data structures using lock-free algorithms. **Large data aggregation** scales up to 4GB instead of being limited by typical map size constraints. |
| 72 | + |
| 73 | +## Implementation: Building a Linked List in Arena Memory |
| 74 | + |
| 75 | +Let's build a complete example that demonstrates arena's power. We'll create a linked list where BPF programs add and delete elements using real pointers, while userspace directly accesses the list to compute sums without any syscalls. |
| 76 | + |
| 77 | +### Complete BPF Program: arena_list.bpf.c |
| 78 | + |
| 79 | +```c |
| 80 | +// SPDX-License-Identifier: GPL-2.0 |
| 81 | +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ |
| 82 | +#define BPF_NO_KFUNC_PROTOTYPES |
| 83 | +#include <vmlinux.h> |
| 84 | +#include <bpf/bpf_helpers.h> |
| 85 | +#include <bpf/bpf_tracing.h> |
| 86 | +#include <bpf/bpf_core_read.h> |
| 87 | +#include "bpf_experimental.h" |
| 88 | + |
| 89 | +struct { |
| 90 | + __uint(type, BPF_MAP_TYPE_ARENA); |
| 91 | + __uint(map_flags, BPF_F_MMAPABLE); |
| 92 | + __uint(max_entries, 100); /* number of pages */ |
| 93 | +#ifdef __TARGET_ARCH_arm64 |
| 94 | + __ulong(map_extra, 0x1ull << 32); /* start of mmap() region */ |
| 95 | +#else |
| 96 | + __ulong(map_extra, 0x1ull << 44); /* start of mmap() region */ |
| 97 | +#endif |
| 98 | +} arena SEC(".maps"); |
| 99 | + |
| 100 | +#include "bpf_arena_alloc.h" |
| 101 | +#include "bpf_arena_list.h" |
| 102 | + |
| 103 | +struct elem { |
| 104 | + struct arena_list_node node; |
| 105 | + __u64 value; |
| 106 | +}; |
| 107 | + |
| 108 | +struct arena_list_head __arena *list_head; |
| 109 | +int list_sum; |
| 110 | +int cnt; |
| 111 | +bool skip = false; |
| 112 | + |
| 113 | +#ifdef __BPF_FEATURE_ADDR_SPACE_CAST |
| 114 | +long __arena arena_sum; |
| 115 | +int __arena test_val = 1; |
| 116 | +struct arena_list_head __arena global_head; |
| 117 | +#else |
| 118 | +long arena_sum SEC(".addr_space.1"); |
| 119 | +int test_val SEC(".addr_space.1"); |
| 120 | +#endif |
| 121 | + |
| 122 | +int zero; |
| 123 | + |
| 124 | +SEC("syscall") |
| 125 | +int arena_list_add(void *ctx) |
| 126 | +{ |
| 127 | +#ifdef __BPF_FEATURE_ADDR_SPACE_CAST |
| 128 | + __u64 i; |
| 129 | + |
| 130 | + list_head = &global_head; |
| 131 | + |
| 132 | + for (i = zero; i < cnt && can_loop; i++) { |
| 133 | + struct elem __arena *n = bpf_alloc(sizeof(*n)); |
| 134 | + |
| 135 | + test_val++; |
| 136 | + n->value = i; |
| 137 | + arena_sum += i; |
| 138 | + list_add_head(&n->node, list_head); |
| 139 | + } |
| 140 | +#else |
| 141 | + skip = true; |
| 142 | +#endif |
| 143 | + return 0; |
| 144 | +} |
| 145 | + |
| 146 | +SEC("syscall") |
| 147 | +int arena_list_del(void *ctx) |
| 148 | +{ |
| 149 | +#ifdef __BPF_FEATURE_ADDR_SPACE_CAST |
| 150 | + struct elem __arena *n; |
| 151 | + int sum = 0; |
| 152 | + |
| 153 | + arena_sum = 0; |
| 154 | + list_for_each_entry(n, list_head, node) { |
| 155 | + sum += n->value; |
| 156 | + arena_sum += n->value; |
| 157 | + list_del(&n->node); |
| 158 | + bpf_free(n); |
| 159 | + } |
| 160 | + list_sum = sum; |
| 161 | +#else |
| 162 | + skip = true; |
| 163 | +#endif |
| 164 | + return 0; |
| 165 | +} |
| 166 | + |
| 167 | +char _license[] SEC("license") = "GPL"; |
| 168 | +``` |
| 169 | +
|
| 170 | +### Understanding the BPF Code |
| 171 | +
|
| 172 | +The program starts by defining the arena map itself. `BPF_MAP_TYPE_ARENA` tells the kernel this is arena memory, and `BPF_F_MMAPABLE` makes it accessible via `mmap()` from userspace. The `max_entries` field specifies how many pages (typically 4KB each) the arena can hold - here we allow up to 100 pages, or about 400KB. The `map_extra` field sets where in the virtual address space the arena gets mapped, using different addresses for ARM64 vs x86-64 to avoid conflicts with existing mappings. |
| 173 | +
|
| 174 | +After defining the map, we include arena helpers. The `bpf_arena_alloc.h` file provides `bpf_alloc()` and `bpf_free()` functions - a simple memory allocator that works with arena pages, similar to `malloc()` and `free()` but specifically for arena memory. The `bpf_arena_list.h` file implements doubly-linked list operations using arena pointers, including `list_add_head()` to prepend nodes and `list_for_each_entry()` to iterate safely. |
| 175 | +
|
| 176 | +Our `elem` structure contains the actual data. The `arena_list_node` member provides the `next` and `pprev` pointers for linking nodes together - these are arena pointers marked with `__arena`. The `value` field holds our payload data. Notice the `__arena` annotation on `list_head` - this tells the verifier this pointer references arena memory, not normal kernel memory. |
| 177 | +
|
| 178 | +The `arena_list_add()` function creates list elements. It's marked `SEC("syscall")` because userspace will trigger it using `bpf_prog_test_run()`. The loop allocates new elements using `bpf_alloc(sizeof(*n))`, which returns an arena pointer. We can then dereference `n->value` directly - the verifier allows this because `n` is an arena pointer. The `list_add_head()` call prepends the new node to the list using normal pointer manipulation, all happening in arena memory. The `can_loop` check satisfies the verifier's bounded loop requirement. |
| 179 | +
|
| 180 | +The `arena_list_del()` function demonstrates iteration and cleanup. The `list_for_each_entry()` macro walks the list following arena pointers. Inside the loop, we sum values and delete nodes. The `bpf_free(n)` call returns memory to the arena allocator, decreasing the reference count and potentially freeing pages when the count hits zero. |
| 181 | +
|
| 182 | +The address space cast feature is crucial. Some compilers support `__BPF_FEATURE_ADDR_SPACE_CAST` which enables the `__arena` annotation to work as a compiler address space. Without this support, we fall back to using explicit section annotations like `SEC(".addr_space.1")`. The code checks for this feature and skips execution if it's not available, preventing runtime errors. |
| 183 | +
|
| 184 | +### Complete User-Space Program: arena_list.c |
| 185 | +
|
| 186 | +```c |
| 187 | +// SPDX-License-Identifier: GPL-2.0 |
| 188 | +/* Copyright (c) 2024 Meta Platforms, Inc. and affiliates. */ |
| 189 | +#include <stdio.h> |
| 190 | +#include <stdlib.h> |
| 191 | +#include <unistd.h> |
| 192 | +#include <sys/mman.h> |
| 193 | +#include <stdint.h> |
| 194 | +#include <bpf/libbpf.h> |
| 195 | +#include <bpf/bpf.h> |
| 196 | +
|
| 197 | +#include "bpf_arena_list.h" |
| 198 | +#include "arena_list.skel.h" |
| 199 | +
|
| 200 | +struct elem { |
| 201 | + struct arena_list_node node; |
| 202 | + uint64_t value; |
| 203 | +}; |
| 204 | +
|
| 205 | +static int list_sum(struct arena_list_head *head) |
| 206 | +{ |
| 207 | + struct elem __arena *n; |
| 208 | + int sum = 0; |
| 209 | +
|
| 210 | + list_for_each_entry(n, head, node) |
| 211 | + sum += n->value; |
| 212 | + return sum; |
| 213 | +} |
| 214 | +
|
| 215 | +static void test_arena_list_add_del(int cnt) |
| 216 | +{ |
| 217 | + LIBBPF_OPTS(bpf_test_run_opts, opts); |
| 218 | + struct arena_list_bpf *skel; |
| 219 | + int expected_sum = (u_int64_t)cnt * (cnt - 1) / 2; |
| 220 | + int ret, sum; |
| 221 | +
|
| 222 | + skel = arena_list_bpf__open_and_load(); |
| 223 | + if (!skel) { |
| 224 | + fprintf(stderr, "Failed to open and load BPF skeleton\n"); |
| 225 | + return; |
| 226 | + } |
| 227 | +
|
| 228 | + skel->bss->cnt = cnt; |
| 229 | + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.arena_list_add), &opts); |
| 230 | + if (ret != 0) { |
| 231 | + fprintf(stderr, "Failed to run arena_list_add: %d\n", ret); |
| 232 | + goto out; |
| 233 | + } |
| 234 | + if (opts.retval != 0) { |
| 235 | + fprintf(stderr, "arena_list_add returned %d\n", opts.retval); |
| 236 | + goto out; |
| 237 | + } |
| 238 | + if (skel->bss->skip) { |
| 239 | + printf("SKIP: compiler doesn't support arena_cast\n"); |
| 240 | + goto out; |
| 241 | + } |
| 242 | + sum = list_sum(skel->bss->list_head); |
| 243 | + printf("Sum of elements: %d (expected: %d)\n", sum, expected_sum); |
| 244 | + printf("Arena sum: %ld (expected: %d)\n", skel->bss->arena_sum, expected_sum); |
| 245 | + printf("Number of elements: %d (expected: %d)\n", skel->data->test_val, cnt + 1); |
| 246 | +
|
| 247 | + ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.arena_list_del), &opts); |
| 248 | + if (ret != 0) { |
| 249 | + fprintf(stderr, "Failed to run arena_list_del: %d\n", ret); |
| 250 | + goto out; |
| 251 | + } |
| 252 | + sum = list_sum(skel->bss->list_head); |
| 253 | + printf("Sum after deletion: %d (expected: 0)\n", sum); |
| 254 | + printf("Sum computed by BPF: %d (expected: %d)\n", skel->bss->list_sum, expected_sum); |
| 255 | + printf("Arena sum after deletion: %ld (expected: %d)\n", skel->bss->arena_sum, expected_sum); |
| 256 | +
|
| 257 | + printf("\nTest passed!\n"); |
| 258 | +out: |
| 259 | + arena_list_bpf__destroy(skel); |
| 260 | +} |
| 261 | +
|
| 262 | +int main(int argc, char **argv) |
| 263 | +{ |
| 264 | + int cnt = 10; |
| 265 | +
|
| 266 | + if (argc > 1) { |
| 267 | + cnt = atoi(argv[1]); |
| 268 | + if (cnt <= 0) { |
| 269 | + fprintf(stderr, "Invalid count: %s\n", argv[1]); |
| 270 | + return 1; |
| 271 | + } |
| 272 | + } |
| 273 | +
|
| 274 | + printf("Testing arena list with %d elements\n", cnt); |
| 275 | + test_arena_list_add_del(cnt); |
| 276 | +
|
| 277 | + return 0; |
| 278 | +} |
| 279 | +``` |
| 280 | + |
| 281 | +### Understanding the User-Space Code |
| 282 | + |
| 283 | +The userspace program demonstrates zero-copy access to arena memory. When we load the BPF skeleton using `arena_list_bpf__open_and_load()`, libbpf automatically `mmap()`s the arena into userspace. The pointer `skel->bss->list_head` points directly into this mapped arena memory. |
| 284 | + |
| 285 | +The `list_sum()` function walks the linked list from userspace. Notice we're using the same `list_for_each_entry()` macro as the BPF code. The list is in arena memory, shared between kernel and userspace. Userspace can directly dereference arena pointers to access node values and follow `next` pointers - no syscalls needed. This is the zero-copy benefit: userspace reads memory directly from the mapped region. |
| 286 | + |
| 287 | +The test flow orchestrates the demonstration. First, we set `skel->bss->cnt` to specify how many list elements to create. Then `bpf_prog_test_run_opts()` executes the `arena_list_add` BPF program, which builds the list in arena memory. Once that returns, userspace immediately calls `list_sum()` to verify the list by walking it directly from userspace - no syscalls, just direct memory access. The expected sum is calculated as 0+1+2+...+(cnt-1), which equals cnt*(cnt-1)/2. |
| 288 | + |
| 289 | +After verifying the list, we run `arena_list_del` to remove all elements. This BPF program walks the list, computes its own sum, and calls `bpf_free()` on each node. Userspace then verifies the list is empty by calling `list_sum()` again, which should return 0. We also check that `skel->bss->list_sum` matches our expected value, confirming the BPF program computed the correct sum before deleting nodes. |
| 290 | + |
| 291 | +## Understanding Arena Memory Allocation |
| 292 | + |
| 293 | +The arena allocator deserves a closer look because it shows how BPF programs can implement sophisticated memory management in arena space. The allocator in `bpf_arena_alloc.h` uses a per-CPU page fragment approach to avoid locking. |
| 294 | + |
| 295 | +Each CPU maintains its own current page and offset. When you call `bpf_alloc(size)`, it first rounds up the size to 8-byte alignment. If the current page has enough space at the current offset, it allocates from there by just decrementing the offset and returning a pointer. If not enough space remains, it allocates a fresh page using `bpf_arena_alloc_pages()`, which is a kernel helper that gets arena pages from the kernel's page allocator. Each page maintains a reference count in its last 8 bytes, tracking how many allocated objects point into that page. |
| 296 | + |
| 297 | +The `bpf_free(addr)` function implements reference-counted deallocation. It rounds the address down to the page boundary, finds the reference count, and decrements it. When the count reaches zero - meaning all objects allocated from that page have been freed - it returns the entire page to the kernel using `bpf_arena_free_pages()`. This page-level reference counting means individual `bpf_free()` calls are fast, and memory is returned to the system only when appropriate. |
| 298 | + |
| 299 | +This allocator design avoids locks by using per-CPU state. Since BPF programs run with preemption disabled on a single CPU, the current CPU's page fragment can be accessed without synchronization. This makes `bpf_alloc()` extremely fast - typically just a few instructions to allocate from the current page. |
| 300 | + |
| 301 | +## Compilation and Execution |
| 302 | + |
| 303 | +Navigate to the bpf_arena directory and build the example: |
| 304 | + |
| 305 | +```bash |
| 306 | +cd /home/yunwei37/workspace/bpf-developer-tutorial/src/features/bpf_arena |
| 307 | +make |
| 308 | +``` |
| 309 | + |
| 310 | +The Makefile compiles the BPF program with `-D__BPF_FEATURE_ADDR_SPACE_CAST` to enable arena pointer support. It uses `bpftool gen object` to process the compiled BPF object and generate a skeleton header that userspace can include. |
| 311 | + |
| 312 | +Run the arena list test with 10 elements: |
| 313 | + |
| 314 | +```bash |
| 315 | +sudo ./arena_list 10 |
| 316 | +``` |
| 317 | + |
| 318 | +Expected output: |
| 319 | + |
| 320 | +``` |
| 321 | +Testing arena list with 10 elements |
| 322 | +Sum of elements: 45 (expected: 45) |
| 323 | +Arena sum: 45 (expected: 45) |
| 324 | +Number of elements: 11 (expected: 11) |
| 325 | +Sum after deletion: 0 (expected: 0) |
| 326 | +Sum computed by BPF: 45 (expected: 45) |
| 327 | +Arena sum after deletion: 45 (expected: 45) |
| 328 | +
|
| 329 | +Test passed! |
| 330 | +``` |
| 331 | + |
| 332 | +Try it with more elements to see arena scaling: |
| 333 | + |
| 334 | +```bash |
| 335 | +sudo ./arena_list 100 |
| 336 | +``` |
| 337 | + |
| 338 | +The sum should be 4950 (100*99/2). Notice that userspace can verify the list by directly accessing arena memory without any syscalls. This zero-copy access is what makes arena powerful for large data structures. |
| 339 | + |
| 340 | +## When to Use Arena vs Other BPF Maps |
| 341 | + |
| 342 | +Choosing the right BPF map type depends on your access patterns and data structure needs. **Use regular BPF maps** (hash, array, etc.) when you need simple key-value storage, small data structures that fit well in maps, standard map operations like atomic updates, or per-CPU statistics without complex linking. Maps excel at straightforward use cases with kernel-provided operations. |
| 343 | + |
| 344 | +**Use BPF Arena** when you need complex linked structures like lists, trees, or graphs, large shared memory exceeding typical map sizes, zero-copy userspace access to avoid syscall overhead, or custom memory management beyond what maps provide. Arena shines for sophisticated data structures where pointer operations are natural. |
| 345 | + |
| 346 | +**Use Ring Buffers** when you need one-way streaming from BPF to userspace, event logs or trace data, or sequentially processed data without random access. Ring buffers are optimized for high-throughput event streams but don't support bidirectional access or complex data structures. |
| 347 | + |
| 348 | +The arena vs maps trade-off fundamentally comes down to pointers and access patterns. If you find yourself encoding indices to simulate pointers in BPF maps, arena is probably the better choice. If you need large-scale data structures accessible from both kernel and userspace, arena's zero-copy shared memory model is hard to beat. |
| 349 | + |
| 350 | +## Summary and Next Steps |
| 351 | + |
| 352 | +BPF Arena solves a fundamental limitation of traditional BPF maps by providing sparse shared memory where you can use real C pointers to build complex data structures. Created by Alexei Starovoitov in 2024, arena enables linked lists, trees, graphs, and custom allocators using normal pointer operations instead of awkward integer indices. Both kernel BPF programs and userspace can map the same arena for zero-copy bidirectional access, eliminating syscall overhead. |
| 353 | + |
| 354 | +Our linked list example demonstrates the core arena concepts: defining an arena map, using `__arena` annotations for pointer types, allocating memory with `bpf_alloc()`, and accessing the same data structure from both kernel and userspace. The per-CPU page fragment allocator shows how BPF programs can implement sophisticated memory management in arena space. Arena unlocks new possibilities for in-kernel data structures, key-value store accelerators, and large-scale data aggregation up to 4GB. |
| 355 | + |
| 356 | +> If you'd like to dive deeper into eBPF, check out our tutorial repository at <https://github.com/eunomia-bpf/bpf-developer-tutorial> or visit our website at <https://eunomia.dev/tutorials/>. |
| 357 | +
|
| 358 | +## References |
| 359 | + |
| 360 | +- **Original Arena Patches:** <https://lwn.net/Articles/961594/> |
| 361 | +- **Meta's Arena Examples:** Linux kernel tree `samples/bpf/arena_*.c` |
| 362 | +- **Tutorial Repository:** <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/bpf_arena> |
| 363 | +- **Linux Kernel Source:** `kernel/bpf/arena.c` - Arena implementation |
| 364 | +- **LLVM Address Spaces:** Documentation on `__arena` compiler support |
| 365 | + |
| 366 | +This example is adapted from Meta's arena_list.c in the Linux kernel samples, with educational enhancements. Requires Linux kernel 6.10+ with `CONFIG_BPF_ARENA=y` enabled. Complete source code available in the tutorial repository. |
0 commit comments