Skip to content

Commit 540ce67

Browse files
fix: Remove bpf_printk calls from struct_ops BPF programs to comply with restrictions
1 parent a1430cf commit 540ce67

File tree

1 file changed

+364
-1
lines changed

1 file changed

+364
-1
lines changed

src/features/struct_ops/README.md

Lines changed: 364 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,26 @@ test_3() called with buffer
5555

5656
### 6. Clean up:
5757
```bash
58+
# First, stop the BPF program gracefully (Ctrl-C if running in foreground)
59+
# This ensures the BPF link is properly destroyed
60+
61+
# Then unload the kernel module
5862
sudo rmmod hello
63+
64+
# If you get "Module hello is in use", there may still be a BPF struct_ops attached
65+
# This can happen if the userspace process was killed (-9) instead of stopped gracefully
66+
# Solutions:
67+
# 1. Wait ~30 seconds for kernel to garbage collect the BPF link
68+
# 2. Force unload: sudo rmmod -f hello (may be unstable)
69+
# 3. Reboot the system
70+
71+
# Clean build artifacts
5972
make clean
6073
```
6174

75+
**Note on Module Unloading:**
76+
The kernel module maintains a reference count while BPF struct_ops programs are attached. When you stop the userspace loader program gracefully (Ctrl-C), it calls `bpf_link__destroy()` which properly detaches the struct_ops and decrements the module reference count. If the process is killed abruptly (kill -9), the kernel should eventually garbage collect the BPF link, but this may take some time.
77+
6278
## How It Works
6379

6480
1. The kernel module registers a custom struct_ops type `bpf_testmod_ops`
@@ -69,6 +85,353 @@ make clean
6985

7086
## Troubleshooting
7187

88+
### Common Issues
89+
7290
- If you get "Failed to attach struct_ops", make sure the kernel module is loaded
7391
- Check `dmesg` for any error messages from the kernel module or BPF verifier
74-
- Ensure your kernel has CONFIG_BPF_SYSCALL=y and supports struct_ops
92+
- Ensure your kernel has CONFIG_BPF_SYSCALL=y and supports struct_ops
93+
94+
## Detailed Troubleshooting Guide
95+
96+
This section documents the complete process of resolving BTF and struct_ops issues encountered during development.
97+
98+
### Issue 1: Missing BTF in Kernel Module
99+
100+
**Problem:**
101+
```
102+
libbpf: failed to find BTF info for struct_ops/bpf_testmod_ops
103+
```
104+
105+
**Root Cause:**
106+
The kernel module was not compiled with BTF (BPF Type Format) information, which is required for struct_ops to work. BTF provides type information that BPF programs need to interact with kernel structures.
107+
108+
**Solution:**
109+
110+
#### Step 1: Extract vmlinux with BTF
111+
The kernel build system needs the `vmlinux` ELF binary (not just headers) to generate BTF for modules.
112+
113+
```bash
114+
# Extract vmlinux from compressed kernel image
115+
sudo /usr/src/linux-headers-$(uname -r)/scripts/extract-vmlinux \
116+
/boot/vmlinuz-$(uname -r) > /tmp/vmlinux
117+
118+
# Copy to kernel build directory
119+
sudo cp /tmp/vmlinux /usr/src/linux-headers-$(uname -r)/vmlinux
120+
121+
# Verify it's an ELF binary
122+
file /tmp/vmlinux
123+
# Output: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked
124+
```
125+
126+
#### Step 2: Upgrade pahole (if needed)
127+
The BTF generation requires `pahole` (from dwarves package) version 1.16+. Older versions don't support the `--btf_features` flag.
128+
129+
Check your version:
130+
```bash
131+
pahole --version
132+
```
133+
134+
If version is < 1.25, compile from source:
135+
136+
```bash
137+
# Install dependencies
138+
sudo apt-get install -y libelf-dev cmake zlib1g-dev
139+
140+
# Downgrade elfutils packages to matching versions
141+
sudo apt-get install -y --allow-downgrades \
142+
libelf1t64=0.190-1.1ubuntu0.1 \
143+
libdw1t64=0.190-1.1ubuntu0.1 \
144+
libdw-dev=0.190-1.1ubuntu0.1 \
145+
libelf-dev=0.190-1.1ubuntu0.1
146+
147+
# Clone and build pahole
148+
git clone https://git.kernel.org/pub/scm/devel/pahole/pahole.git /tmp/pahole
149+
cd /tmp/pahole
150+
mkdir build && cd build
151+
cmake -DCMAKE_INSTALL_PREFIX=/usr ..
152+
make -j$(nproc)
153+
sudo make install
154+
155+
# Verify new version
156+
pahole --version # Should show v1.30 or higher
157+
```
158+
159+
#### Step 3: Rebuild the module with BTF
160+
The module Makefile already has BTF enabled with `-g -O2` flags. Simply rebuild:
161+
162+
```bash
163+
cd module
164+
make clean
165+
make
166+
```
167+
168+
Verify BTF was generated:
169+
```bash
170+
readelf -S hello.ko | grep BTF
171+
# Should show:
172+
# [60] .BTF PROGBITS ...
173+
# [61] .BTF.base PROGBITS ...
174+
```
175+
176+
### Issue 2: Kernel Panic on Module Load
177+
178+
**Problem:**
179+
Loading the module causes a kernel panic or NULL pointer dereference.
180+
181+
**Root Cause:**
182+
The `bpf_struct_ops` structure was missing required callback functions that the kernel tries to access during registration:
183+
- `.verifier_ops` - BPF verifier operations (NULL pointer dereference)
184+
- `.init` - BTF initialization callback
185+
- `.init_member` - Member initialization callback
186+
187+
**Error Pattern in dmesg:**
188+
```
189+
BUG: kernel NULL pointer dereference
190+
Call Trace:
191+
register_bpf_struct_ops
192+
...
193+
```
194+
195+
**Solution:**
196+
Add the required callbacks to the module (`module/hello.c`):
197+
198+
```c
199+
/* BTF initialization callback */
200+
static int bpf_testmod_ops_init(struct btf *btf)
201+
{
202+
/* Initialize BTF if needed */
203+
return 0;
204+
}
205+
206+
/* Verifier access control */
207+
static bool bpf_testmod_ops_is_valid_access(int off, int size,
208+
enum bpf_access_type type,
209+
const struct bpf_prog *prog,
210+
struct bpf_insn_access_aux *info)
211+
{
212+
/* Allow all accesses for this example */
213+
return true;
214+
}
215+
216+
/* Verifier operations structure */
217+
static const struct bpf_verifier_ops bpf_testmod_verifier_ops = {
218+
.is_valid_access = bpf_testmod_ops_is_valid_access,
219+
};
220+
221+
/* Member initialization callback */
222+
static int bpf_testmod_ops_init_member(const struct btf_type *t,
223+
const struct btf_member *member,
224+
void *kdata, const void *udata)
225+
{
226+
/* No special member initialization needed */
227+
return 0;
228+
}
229+
230+
/* Updated struct_ops definition with ALL required callbacks */
231+
static struct bpf_struct_ops bpf_testmod_ops_struct_ops = {
232+
.verifier_ops = &bpf_testmod_verifier_ops, // REQUIRED
233+
.init = bpf_testmod_ops_init, // REQUIRED
234+
.init_member = bpf_testmod_ops_init_member, // REQUIRED
235+
.reg = bpf_testmod_ops_reg,
236+
.unreg = bpf_testmod_ops_unreg,
237+
.cfi_stubs = &__bpf_ops_bpf_testmod_ops,
238+
.name = "bpf_testmod_ops",
239+
.owner = THIS_MODULE,
240+
};
241+
```
242+
243+
**Why This Matters:**
244+
The kernel's `register_bpf_struct_ops()` function expects these callbacks to be present. When it tries to call them and finds NULL pointers, it causes a kernel panic. These callbacks are essential for:
245+
- **verifier_ops**: Validates BPF program access to struct_ops members
246+
- **init**: Initializes BTF type information for the struct_ops
247+
- **init_member**: Handles special initialization for data members
248+
249+
After adding these callbacks, rebuild and reload:
250+
```bash
251+
cd module
252+
make clean
253+
make
254+
sudo insmod hello.ko
255+
dmesg | tail
256+
# Should see: "bpf_testmod loaded with struct_ops support"
257+
```
258+
259+
### Issue 3: BPF Program Load Failure - Invalid Helper
260+
261+
**Problem:**
262+
```
263+
libbpf: prog 'bpf_testmod_test_1': BPF program load failed: Invalid argument
264+
program of this type cannot use helper bpf_trace_printk#6
265+
```
266+
267+
**Root Cause:**
268+
struct_ops BPF programs have restricted helper function access. `bpf_trace_printk` (bpf_printk) is not allowed in struct_ops context because these programs run in a different context than tracing programs.
269+
270+
**Solution:**
271+
Remove all `bpf_printk()` calls from struct_ops BPF programs:
272+
273+
```c
274+
// BEFORE (fails to load):
275+
SEC("struct_ops/test_1")
276+
int BPF_PROG(bpf_testmod_test_1)
277+
{
278+
bpf_printk("BPF test_1 called!\n"); // NOT ALLOWED
279+
return 42;
280+
}
281+
282+
// AFTER (works):
283+
SEC("struct_ops/test_1")
284+
int BPF_PROG(bpf_testmod_test_1)
285+
{
286+
/* Return a special value to indicate BPF implementation */
287+
return 42;
288+
}
289+
```
290+
291+
**Alternative Debugging Approaches:**
292+
1. Use BPF maps to export counters/statistics to userspace
293+
2. Use the kernel module's `printk()` to log struct_ops invocations
294+
3. Use `bpftool prog tracelog` to see what programs are being called
295+
296+
### Verification Checklist
297+
298+
After resolving all issues, verify everything works:
299+
300+
```bash
301+
# 1. Check module BTF
302+
readelf -S module/hello.ko | grep BTF
303+
304+
# 2. Load module successfully
305+
sudo insmod module/hello.ko
306+
dmesg | tail -5
307+
# Should see: "bpf_testmod loaded with struct_ops support"
308+
309+
# 3. Verify proc file created
310+
ls -l /proc/bpf_testmod_trigger
311+
# Should exist with write permissions
312+
313+
# 4. Build and load BPF program
314+
make
315+
sudo ./struct_ops
316+
# Should see: "Successfully loaded and attached BPF struct_ops!"
317+
318+
# 5. Verify callbacks are being invoked
319+
sudo dmesg | tail -20
320+
# Should see periodic output:
321+
# Calling struct_ops callbacks:
322+
# test_1() returned: 42
323+
# test_2(10, 20) returned: 30
324+
# test_3() called with buffer
325+
326+
# 6. Clean up
327+
sudo rmmod hello
328+
```
329+
330+
### Key Takeaways
331+
332+
1. **BTF is mandatory** for struct_ops - ensure `vmlinux` is available and `pahole` is recent enough
333+
2. **All required callbacks must be present** in the `bpf_struct_ops` structure (verifier_ops, init, init_member)
334+
3. **Helper restrictions apply** - struct_ops programs cannot use tracing helpers like `bpf_printk`
335+
4. **Test incrementally** - load module first, then BPF program, to isolate issues
336+
337+
## Kernel Source Code Analysis
338+
339+
### Root Cause of Kernel Panic (Confirmed from Kernel 6.18-rc4 Source)
340+
341+
The kernel panic was caused by **missing NULL pointer checks** in the kernel's struct_ops registration code. Analysis of the Linux kernel source code (version 6.18-rc4) reveals three critical locations where callback pointers are dereferenced without validation:
342+
343+
#### 1. Missing NULL check for `st_ops->init` callback
344+
**Location**: `kernel/bpf/bpf_struct_ops.c:381`
345+
346+
```c
347+
if (st_ops->init(btf)) { // ← NULL pointer dereference if init is NULL
348+
pr_warn("Error in init bpf_struct_ops %s\n",
349+
st_ops->name);
350+
err = -EINVAL;
351+
goto errout;
352+
}
353+
```
354+
355+
The code calls `st_ops->init(btf)` directly in the `bpf_struct_ops_desc_init()` function without checking if the callback exists. If a module registers struct_ops with `init = NULL`, this causes an immediate kernel panic.
356+
357+
#### 2. Missing NULL check for `st_ops->init_member` callback
358+
**Location**: `kernel/bpf/bpf_struct_ops.c:753`
359+
360+
```c
361+
err = st_ops->init_member(t, member, kdata, udata); // ← NULL pointer dereference
362+
if (err < 0)
363+
goto reset_unlock;
364+
365+
/* The ->init_member() has handled this member */
366+
if (err > 0)
367+
continue;
368+
```
369+
370+
During map update operations, the kernel calls `st_ops->init_member()` for each struct member without verifying the callback pointer is non-NULL.
371+
372+
#### 3. Missing NULL check for `st_ops->verifier_ops`
373+
**Location**: `kernel/bpf/verifier.c:23486`
374+
375+
```c
376+
env->ops = st_ops->verifier_ops; // ← Assigns potentially NULL pointer
377+
```
378+
379+
The BPF verifier assigns `verifier_ops` directly and later dereferences it through `env->ops->*` calls. If `verifier_ops` is NULL, subsequent verifier operations will cause a kernel panic.
380+
381+
### Why These Callbacks Are Mandatory
382+
383+
The kernel code **assumes** these callbacks exist and does not provide fallback behavior:
384+
385+
1. **`init`**: Called during struct_ops registration to initialize BTF type information. No default implementation exists.
386+
2. **`init_member`**: Called for each struct member during map updates to handle special initialization. Return value of 0 means "not handled", >0 means "handled", <0 is error.
387+
3. **`verifier_ops`**: Provides verification operations (e.g., `is_valid_access`) that control BPF program access to struct_ops context.
388+
389+
### Is This Fixed in Current Kernel?
390+
391+
**No.** As of Linux kernel 6.18-rc4 (checked 2025-11-10), these NULL pointer dereferences still exist. The kernel code has not added defensive NULL checks for these callbacks.
392+
393+
This means:
394+
-**Our fix is correct** - providing all three callbacks prevents the kernel panic
395+
-**Kernel could be more defensive** - ideally it should validate callbacks before dereferencing
396+
- ⚠️ **All struct_ops modules MUST provide these callbacks** - this is an undocumented requirement
397+
398+
### Recommendation for Kernel Upstream
399+
400+
The kernel should add validation before dereferencing these pointers:
401+
402+
```c
403+
// Suggested fix for kernel/bpf/bpf_struct_ops.c:381
404+
if (st_ops->init && st_ops->init(btf)) {
405+
pr_warn("Error in init bpf_struct_ops %s\n", st_ops->name);
406+
err = -EINVAL;
407+
goto errout;
408+
}
409+
410+
// Suggested fix for kernel/bpf/bpf_struct_ops.c:753
411+
if (st_ops->init_member) {
412+
err = st_ops->init_member(t, member, kdata, udata);
413+
if (err < 0)
414+
goto reset_unlock;
415+
if (err > 0)
416+
continue;
417+
}
418+
419+
// Suggested fix for registration
420+
if (!st_ops->verifier_ops) {
421+
pr_warn("struct_ops %s missing verifier_ops\n", st_ops->name);
422+
return -EINVAL;
423+
}
424+
```
425+
426+
However, until such changes are merged, **all struct_ops implementations must provide these callbacks** to avoid kernel panics.
427+
428+
---
429+
430+
## Additional Resources
431+
432+
- **Kernel Test Module**: `/home/yunwei37/linux/tools/testing/selftests/bpf/test_kmods/bpf_testmod.c` - Official kernel reference implementation
433+
- **BPF Documentation**: https://www.kernel.org/doc/html/latest/bpf/
434+
435+
## Contributing
436+
437+
If you encounter similar issues or have improvements, please document them and contribute back to the tutorial.

0 commit comments

Comments
 (0)