You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/features/struct_ops/README.md
+364-1Lines changed: 364 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,10 +55,26 @@ test_3() called with buffer
55
55
56
56
### 6. Clean up:
57
57
```bash
58
+
# First, stop the BPF program gracefully (Ctrl-C if running in foreground)
59
+
# This ensures the BPF link is properly destroyed
60
+
61
+
# Then unload the kernel module
58
62
sudo rmmod hello
63
+
64
+
# If you get "Module hello is in use", there may still be a BPF struct_ops attached
65
+
# This can happen if the userspace process was killed (-9) instead of stopped gracefully
66
+
# Solutions:
67
+
# 1. Wait ~30 seconds for kernel to garbage collect the BPF link
68
+
# 2. Force unload: sudo rmmod -f hello (may be unstable)
69
+
# 3. Reboot the system
70
+
71
+
# Clean build artifacts
59
72
make clean
60
73
```
61
74
75
+
**Note on Module Unloading:**
76
+
The kernel module maintains a reference count while BPF struct_ops programs are attached. When you stop the userspace loader program gracefully (Ctrl-C), it calls `bpf_link__destroy()` which properly detaches the struct_ops and decrements the module reference count. If the process is killed abruptly (kill -9), the kernel should eventually garbage collect the BPF link, but this may take some time.
77
+
62
78
## How It Works
63
79
64
80
1. The kernel module registers a custom struct_ops type `bpf_testmod_ops`
@@ -69,6 +85,353 @@ make clean
69
85
70
86
## Troubleshooting
71
87
88
+
### Common Issues
89
+
72
90
- If you get "Failed to attach struct_ops", make sure the kernel module is loaded
73
91
- Check `dmesg` for any error messages from the kernel module or BPF verifier
74
-
- Ensure your kernel has CONFIG_BPF_SYSCALL=y and supports struct_ops
92
+
- Ensure your kernel has CONFIG_BPF_SYSCALL=y and supports struct_ops
93
+
94
+
## Detailed Troubleshooting Guide
95
+
96
+
This section documents the complete process of resolving BTF and struct_ops issues encountered during development.
97
+
98
+
### Issue 1: Missing BTF in Kernel Module
99
+
100
+
**Problem:**
101
+
```
102
+
libbpf: failed to find BTF info for struct_ops/bpf_testmod_ops
103
+
```
104
+
105
+
**Root Cause:**
106
+
The kernel module was not compiled with BTF (BPF Type Format) information, which is required for struct_ops to work. BTF provides type information that BPF programs need to interact with kernel structures.
107
+
108
+
**Solution:**
109
+
110
+
#### Step 1: Extract vmlinux with BTF
111
+
The kernel build system needs the `vmlinux` ELF binary (not just headers) to generate BTF for modules.
The kernel's `register_bpf_struct_ops()` function expects these callbacks to be present. When it tries to call them and finds NULL pointers, it causes a kernel panic. These callbacks are essential for:
245
+
- **verifier_ops**: Validates BPF program access to struct_ops members
246
+
- **init**: Initializes BTF type information for the struct_ops
247
+
- **init_member**: Handles special initialization for data members
248
+
249
+
After adding these callbacks, rebuild and reload:
250
+
```bash
251
+
cd module
252
+
make clean
253
+
make
254
+
sudo insmod hello.ko
255
+
dmesg | tail
256
+
# Should see: "bpf_testmod loaded with struct_ops support"
257
+
```
258
+
259
+
### Issue 3: BPF Program Load Failure - Invalid Helper
260
+
261
+
**Problem:**
262
+
```
263
+
libbpf: prog 'bpf_testmod_test_1': BPF program load failed: Invalid argument
264
+
program of this type cannot use helper bpf_trace_printk#6
265
+
```
266
+
267
+
**Root Cause:**
268
+
struct_ops BPF programs have restricted helper function access. `bpf_trace_printk` (bpf_printk) is not allowed in struct_ops context because these programs run in a different context than tracing programs.
269
+
270
+
**Solution:**
271
+
Remove all `bpf_printk()` calls from struct_ops BPF programs:
272
+
273
+
```c
274
+
// BEFORE (fails to load):
275
+
SEC("struct_ops/test_1")
276
+
int BPF_PROG(bpf_testmod_test_1)
277
+
{
278
+
bpf_printk("BPF test_1 called!\n"); // NOT ALLOWED
279
+
return 42;
280
+
}
281
+
282
+
// AFTER (works):
283
+
SEC("struct_ops/test_1")
284
+
int BPF_PROG(bpf_testmod_test_1)
285
+
{
286
+
/* Return a special value to indicate BPF implementation */
287
+
return 42;
288
+
}
289
+
```
290
+
291
+
**Alternative Debugging Approaches:**
292
+
1. Use BPF maps to export counters/statistics to userspace
293
+
2. Use the kernel module's `printk()` to log struct_ops invocations
294
+
3. Use `bpftool prog tracelog` to see what programs are being called
295
+
296
+
### Verification Checklist
297
+
298
+
After resolving all issues, verify everything works:
299
+
300
+
```bash
301
+
# 1. Check module BTF
302
+
readelf -S module/hello.ko | grep BTF
303
+
304
+
# 2. Load module successfully
305
+
sudo insmod module/hello.ko
306
+
dmesg | tail -5
307
+
# Should see: "bpf_testmod loaded with struct_ops support"
308
+
309
+
# 3. Verify proc file created
310
+
ls -l /proc/bpf_testmod_trigger
311
+
# Should exist with write permissions
312
+
313
+
# 4. Build and load BPF program
314
+
make
315
+
sudo ./struct_ops
316
+
# Should see: "Successfully loaded and attached BPF struct_ops!"
317
+
318
+
# 5. Verify callbacks are being invoked
319
+
sudo dmesg | tail -20
320
+
# Should see periodic output:
321
+
# Calling struct_ops callbacks:
322
+
# test_1() returned: 42
323
+
# test_2(10, 20) returned: 30
324
+
# test_3() called with buffer
325
+
326
+
# 6. Clean up
327
+
sudo rmmod hello
328
+
```
329
+
330
+
### Key Takeaways
331
+
332
+
1.**BTF is mandatory** for struct_ops - ensure `vmlinux` is available and `pahole` is recent enough
333
+
2.**All required callbacks must be present** in the `bpf_struct_ops` structure (verifier_ops, init, init_member)
334
+
3.**Helper restrictions apply** - struct_ops programs cannot use tracing helpers like `bpf_printk`
335
+
4.**Test incrementally** - load module first, then BPF program, to isolate issues
336
+
337
+
## Kernel Source Code Analysis
338
+
339
+
### Root Cause of Kernel Panic (Confirmed from Kernel 6.18-rc4 Source)
340
+
341
+
The kernel panic was caused by **missing NULL pointer checks** in the kernel's struct_ops registration code. Analysis of the Linux kernel source code (version 6.18-rc4) reveals three critical locations where callback pointers are dereferenced without validation:
342
+
343
+
#### 1. Missing NULL check for `st_ops->init` callback
344
+
**Location**: `kernel/bpf/bpf_struct_ops.c:381`
345
+
346
+
```c
347
+
if (st_ops->init(btf)) { // ← NULL pointer dereference if init is NULL
348
+
pr_warn("Error in init bpf_struct_ops %s\n",
349
+
st_ops->name);
350
+
err = -EINVAL;
351
+
goto errout;
352
+
}
353
+
```
354
+
355
+
The code calls `st_ops->init(btf)` directly in the `bpf_struct_ops_desc_init()` function without checking if the callback exists. If a module registers struct_ops with `init = NULL`, this causes an immediate kernel panic.
356
+
357
+
#### 2. Missing NULL check for `st_ops->init_member` callback
The BPF verifier assigns `verifier_ops` directly and later dereferences it through `env->ops->*` calls. If `verifier_ops` is NULL, subsequent verifier operations will cause a kernel panic.
380
+
381
+
### Why These Callbacks Are Mandatory
382
+
383
+
The kernel code **assumes** these callbacks exist and does not provide fallback behavior:
384
+
385
+
1.**`init`**: Called during struct_ops registration to initialize BTF type information. No default implementation exists.
386
+
2.**`init_member`**: Called for each struct member during map updates to handle special initialization. Return value of 0 means "not handled", >0 means "handled", <0 is error.
387
+
3.**`verifier_ops`**: Provides verification operations (e.g., `is_valid_access`) that control BPF program access to struct_ops context.
388
+
389
+
### Is This Fixed in Current Kernel?
390
+
391
+
**No.** As of Linux kernel 6.18-rc4 (checked 2025-11-10), these NULL pointer dereferences still exist. The kernel code has not added defensive NULL checks for these callbacks.
392
+
393
+
This means:
394
+
- ✅ **Our fix is correct** - providing all three callbacks prevents the kernel panic
395
+
- ❌ **Kernel could be more defensive** - ideally it should validate callbacks before dereferencing
396
+
- ⚠️ **All struct_ops modules MUST provide these callbacks** - this is an undocumented requirement
397
+
398
+
### Recommendation for Kernel Upstream
399
+
400
+
The kernel should add validation before dereferencing these pointers:
401
+
402
+
```c
403
+
// Suggested fix for kernel/bpf/bpf_struct_ops.c:381
404
+
if (st_ops->init && st_ops->init(btf)) {
405
+
pr_warn("Error in init bpf_struct_ops %s\n", st_ops->name);
406
+
err = -EINVAL;
407
+
goto errout;
408
+
}
409
+
410
+
// Suggested fix for kernel/bpf/bpf_struct_ops.c:753
0 commit comments