Skip to content

Commit 2ca0e40

Browse files
committed
feat: Add Python stack profiler using eBPF for enhanced performance analysis
1 parent b8cc834 commit 2ca0e40

File tree

7 files changed

+966
-2
lines changed

7 files changed

+966
-2
lines changed

src/46-xdp-test/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,11 @@ Traditional BPF_PROG_RUN operates in "dry run" mode - packets are processed but
1818

1919
### Live Frames Mode: Real Packet Injection
2020

21-
In Linux 5.18+, the kernel introduced **live frames mode** via the `BPF_F_TEST_XDP_LIVE_FRAMES` flag. This fundamentally changes BPF_PROG_RUN behavior. When enabled, XDP_TX actions don't just return - they actually transmit packets on the wire through the specified network interface. This turns BPF_PROG_RUN into a powerful packet generator.
21+
In Linux 5.18+, the kernel introduced live frames mode via the `BPF_F_TEST_XDP_LIVE_FRAMES` flag. This fundamentally changes BPF_PROG_RUN behavior. When enabled, XDP_TX actions don't just return; they actually transmit packets on the wire through the specified network interface. This turns BPF_PROG_RUN into a powerful packet generator.
2222

2323
Here's how it works: Your userspace program constructs a packet (Ethernet frame with IP header, UDP payload, etc.) and passes it to `bpf_prog_test_run()` with live frames enabled. The XDP program receives this packet in its `xdp_md` context. If the program returns `XDP_TX`, the kernel transmits the packet through the network driver as if it arrived on the interface and was reflected back. The packet appears on the wire with full hardware offload support (checksumming, segmentation, etc.).
2424

25-
This enables several powerful use cases. **Network stack stress testing**: Flood your system with millions of packets per second to find breaking points in the network stack, driver, or application layer. **XDP program benchmarking**: Measure how many packets per second your XDP program can process under realistic load without external packet generators. **Protocol fuzzing**: Generate malformed packets or unusual protocol sequences to test robustness. **Synthetic traffic generation**: Create realistic traffic patterns for testing load balancers, firewalls, or intrusion detection systems.
25+
This enables several powerful use cases. Network stack stress testing floods your system with millions of packets per second to find breaking points in the network stack, driver, or application layer. XDP program benchmarking measures how many packets per second your XDP program can process under realistic load without external packet generators. Protocol fuzzing generates malformed packets or unusual protocol sequences to test robustness. Synthetic traffic generation creates realistic traffic patterns for testing load balancers, firewalls, or intrusion detection systems.
2626

2727
### The XDP_TX Reflection Loop
2828

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Build outputs
2+
*.o
3+
*.bpf.o
4+
*.skel.h
5+
python-stack
6+
7+
# Output directory
8+
.output/
9+
10+
# Editor files
11+
*.swp
12+
*.swo
13+
*~
14+
.vscode/
15+
.idea/
16+
17+
# Temporary files
18+
*.tmp
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
APP := python-stack
2+
3+
THIRD_PARTY_PATH := ../../third_party
4+
5+
# Architecture detection
6+
ARCH := $(shell uname -m | sed 's/x86_64/x86/' | sed 's/aarch64/arm64/')
7+
8+
# VMLINUX header path
9+
VMLINUX_DIR := $(THIRD_PARTY_PATH)/vmlinux/$(ARCH)
10+
VMLINUX_BTF_H := $(VMLINUX_DIR)/vmlinux.h
11+
12+
# Libbpf
13+
LIBBPF_SRC := $(abspath $(THIRD_PARTY_PATH)/libbpf/src)
14+
LIBBPF_OBJ := $(abspath $(THIRD_PARTY_PATH)/libbpf/src/staticobjs/libbpf.a)
15+
LIBBPF_OBJDIR := $(abspath $(THIRD_PARTY_PATH)/libbpf/src/staticobjs)
16+
17+
# BPF Code
18+
CLANG ?= clang
19+
BPFTOOL ?= $(abspath $(THIRD_PARTY_PATH)/bpftool/src/bpftool)
20+
21+
INCLUDES := -I$(LIBBPF_SRC) -I$(THIRD_PARTY_PATH)/bpftool/include/uapi -I$(VMLINUX_DIR)
22+
CFLAGS := -g -Wall
23+
24+
ALL_LDFLAGS := $(LDFLAGS)
25+
26+
APPS = $(APP)
27+
28+
# BPF source
29+
BPF_SRC := $(APP).bpf.c
30+
31+
# BPF object and skeleton
32+
BPF_OBJ := $(APP).bpf.o
33+
BPF_SKEL := $(APP).skel.h
34+
35+
# Userspace source
36+
USER_SRC := $(APP).c
37+
USER_OBJ := $(APP).o
38+
39+
.PHONY: all
40+
all: $(APPS)
41+
42+
# Build libbpf
43+
$(LIBBPF_OBJ): $(wildcard $(LIBBPF_SRC)/*.c) $(wildcard $(LIBBPF_SRC)/*.h)
44+
@echo "Building libbpf..."
45+
$(MAKE) -C $(LIBBPF_SRC) BUILD_STATIC_ONLY=1 OBJDIR=$(LIBBPF_OBJDIR)
46+
47+
# Build bpftool
48+
$(BPFTOOL):
49+
@echo "Building bpftool..."
50+
$(MAKE) -C $(THIRD_PARTY_PATH)/bpftool/src
51+
52+
# Generate vmlinux.h if needed
53+
$(VMLINUX_BTF_H):
54+
@if [ ! -f $(VMLINUX_BTF_H) ]; then \
55+
echo "Generating $(VMLINUX_BTF_H)..."; \
56+
mkdir -p $(VMLINUX_DIR); \
57+
$(BPFTOOL) btf dump file /sys/kernel/btf/vmlinux format c > $(VMLINUX_BTF_H); \
58+
fi
59+
60+
# Build BPF object
61+
$(BPF_OBJ): $(BPF_SRC) $(LIBBPF_OBJ) $(VMLINUX_BTF_H)
62+
@echo "Building BPF object: $(BPF_OBJ)"
63+
$(CLANG) -g -O2 -target bpf -D__TARGET_ARCH_$(ARCH) $(INCLUDES) -c $(BPF_SRC) -o $(BPF_OBJ)
64+
65+
# Generate BPF skeleton
66+
$(BPF_SKEL): $(BPF_OBJ) $(BPFTOOL)
67+
@echo "Generating BPF skeleton: $(BPF_SKEL)"
68+
$(BPFTOOL) gen skeleton $(BPF_OBJ) > $(BPF_SKEL)
69+
70+
# Build userspace program
71+
$(USER_OBJ): $(USER_SRC) $(BPF_SKEL)
72+
@echo "Building userspace object: $(USER_OBJ)"
73+
$(CC) $(CFLAGS) $(INCLUDES) -c $(USER_SRC) -o $(USER_OBJ)
74+
75+
# Link final binary
76+
$(APP): $(USER_OBJ) $(LIBBPF_OBJ)
77+
@echo "Linking $(APP)..."
78+
$(CC) $(CFLAGS) $^ $(ALL_LDFLAGS) -lelf -lz -o $@
79+
80+
# Clean
81+
.PHONY: clean
82+
clean:
83+
rm -f $(BPF_OBJ) $(BPF_SKEL) $(USER_OBJ) $(APP)
84+
rm -f *.o *.skel.h
85+
86+
# Help
87+
.PHONY: help
88+
help:
89+
@echo "Makefile for $(APP)"
90+
@echo ""
91+
@echo "Targets:"
92+
@echo " all - Build everything (default)"
93+
@echo " clean - Remove generated files"
94+
@echo " help - Show this help"
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# eBPF Tutorial: Python Stack Profiler
2+
3+
Profile Python applications at the OS level using eBPF to capture native and Python call stacks, helping identify performance bottlenecks in Python programs including data science workloads, web servers, and ML inference.
4+
5+
> The complete source code: <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/trace/python-stack-profiler>
6+
7+
## Overview
8+
9+
Python profiling traditionally relies on instrumentation (cProfile) or sampling within the interpreter (py-spy). These approaches have limitations:
10+
- **cProfile**: High overhead, requires code modification
11+
- **py-spy**: Samples from userspace, may miss short-lived functions
12+
- **perf**: Captures native stacks but can't see Python function names
13+
14+
This tutorial shows how to use eBPF to capture both native C stacks AND Python interpreter stacks, giving you complete visibility into where your Python application spends time.
15+
16+
## What You'll Learn
17+
18+
1. How to attach eBPF probes to Python processes
19+
2. Walking Python interpreter frame structures from kernel space
20+
3. Extracting Python function names, filenames, and line numbers
21+
4. Combining native and Python stacks for complete profiling
22+
5. Generating flamegraphs for Python applications
23+
24+
## Prerequisites
25+
26+
- Linux kernel 5.15+ (for BPF ring buffer support)
27+
- Python 3.8+ running on your system
28+
- Root access (for loading eBPF programs)
29+
- Understanding of stack traces and profiling concepts
30+
31+
## Building and Running
32+
33+
```bash
34+
make
35+
sudo ./python-stack
36+
```
37+
38+
## How It Works
39+
40+
The profiler samples Python processes at a regular interval (e.g., 49Hz to avoid lock-step with scheduler). For each sample:
41+
42+
1. **Capture native stack**: Use BPF stack helpers to get kernel and userspace stacks
43+
2. **Identify Python threads**: Check if the process is running Python interpreter
44+
3. **Walk Python frames**: Read PyFrameObject chain from CPython internals
45+
4. **Extract symbols**: Get function names, filenames, line numbers from PyCodeObject
46+
5. **Aggregate data**: Count stack occurrences for flamegraph generation
47+
48+
## Python Internals
49+
50+
CPython's frame structure (simplified):
51+
52+
```c
53+
struct _frame {
54+
struct _frame *f_back; // Previous frame
55+
PyCodeObject *f_code; // Code object
56+
int f_lineno; // Current line number
57+
};
58+
59+
struct PyCodeObject {
60+
PyObject *co_filename; // Source filename
61+
PyObject *co_name; // Function name
62+
};
63+
```
64+
65+
## Example Output
66+
67+
```
68+
python-script.py:main;process_data;expensive_function 247
69+
python-script.py:main;load_model;torch.load 189
70+
python-script.py:main;preprocess;np.array 156
71+
```
72+
73+
Each line shows the stack trace and sample count.
74+
75+
## Use Cases
76+
77+
- **ML/AI workloads**: Profile PyTorch, TensorFlow, NumPy operations
78+
- **Web servers**: Find bottlenecks in Flask, Django, FastAPI
79+
- **Data processing**: Optimize pandas, polars operations
80+
- **General Python**: Any Python application performance analysis
81+
82+
## Next Steps
83+
84+
- Extend to capture GIL contention
85+
- Add Python object allocation tracking
86+
- Integrate with other eBPF metrics (CPU, memory)
87+
- Build flamegraph visualization
88+
89+
## References
90+
91+
- [CPython Internals](https://realpython.com/cpython-source-code-guide/)
92+
- [Python Frame Objects](https://docs.python.org/3/c-api/frame.html)
93+
- [eBPF Stack Traces](https://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html)

0 commit comments

Comments
 (0)