You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/47-cuda-events/README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -477,6 +477,8 @@ The `cuda_events` tool supports these options:
477
477
478
478
## Next Steps
479
479
480
+
Beyond tracing, eBPF can also extend GPU driver behavior—see our [gpu_ext project](https://github.com/eunomia-bpf/gpu_ext) for GPU scheduling and memory offloading via BPF struct_ops ([LPC 2024 talk](https://lpc.events/event/19/contributions/2168/)).
481
+
480
482
Once you're comfortable with this basic CUDA tracing tool, you could extend it to:
Copy file name to clipboardExpand all lines: src/features/struct_ops/README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Have you ever wanted to implement a kernel feature, like a new network protocol
4
4
5
5
This is the power of **BPF struct_ops**. This advanced eBPF feature allows BPF programs to implement the callbacks for a kernel structure of operations, effectively allowing you to "plug in" BPF code to act as a kernel subsystem. It's a step beyond simple tracing or filtering; it's about implementing core kernel logic in BPF. For instance, we also use it to implement GPU scheduling and memory offloading extensions with eBPF in GPU drivers (see [LPC 2024 talk](https://lpc.events/event/19/contributions/2168/) and the [gpu_ext project](https://github.com/eunomia-bpf/gpu_ext)).
6
6
7
-
In this tutorial, we will explore how to use `struct_ops` to dynamically implement a kernel subsystem's functionality. We won't be using the common TCP congestion control example. Instead, we'll take a more fundamental approach that mirrors the extensibility seen with kfuncs. We will create a custom kernel module that defines a new, simple subsystem with a set of operations. This module will act as a placeholder, creating new attachment points for our BPF programs. Then, we will write a BPF program to implement the logic for these operations. This demonstrates a powerful pattern: using a minimal kernel module to expose a `struct_ops` interface, and then using BPF to provide the full, complex implementation.
7
+
In this tutorial, we will explore how to use `struct_ops` to dynamically extend kernel subsystem behavior. We won't be using the common TCP congestion control example. Instead, we'll take a more fundamental approach that mirrors the extensibility seen with kfuncs. We will create a custom kernel module that defines a new, simple subsystem with a set of operations. This module will act as a placeholder, creating new attachment points for our BPF programs. Then, we will write a BPF program to implement the logic for these operations. This demonstrates a powerful pattern: using a minimal kernel module to expose a `struct_ops` interface, and then using BPF to provide the full, complex implementation.
8
8
9
9
> The complete source code for this tutorial can be found here: <https://github.com/eunomia-bpf/bpf-developer-tutorial/tree/main/src/features/struct_ops>
10
10
@@ -13,6 +13,7 @@ In this tutorial, we will explore how to use `struct_ops` to dynamically impleme
13
13
### The Challenge: Extending Kernel Behavior Safely and Dynamically
14
14
15
15
Traditionally, adding new functionality to the Linux kernel, such as a new file system, a network protocol, or a scheduler algorithm, requires writing a kernel module. While powerful, kernel modules come with significant challenges:
16
+
16
17
-**Complexity:** Kernel development has a steep learning curve and requires a deep understanding of kernel internals.
17
18
-**Safety:** A bug in a kernel module can easily crash the entire system. There are no sandboxing guarantees.
18
19
-**Maintenance:** Kernel modules must be maintained and recompiled for different kernel versions, creating a tight coupling with the kernel's internal APIs.
@@ -28,6 +29,7 @@ This is a paradigm shift. It's no longer just about observing or filtering; it's
28
29
This approach is similar in spirit to how **kfuncs** allow developers to extend the capabilities of BPF. With kfuncs, we can add custom helper functions to the BPF runtime by defining them in a kernel module. With `struct_ops`, we take this a step further: we define a whole new *set of attach points* for BPF programs, effectively creating a custom, BPF-programmable subsystem within the kernel.
29
30
30
31
The benefits are immense:
32
+
31
33
-**Dynamic Implementation**: You can load, update, and unload the BPF programs implementing the subsystem logic on the fly, without restarting the kernel or the application.
32
34
-**Safety**: The BPF verifier ensures that the BPF programs are safe to run, preventing common pitfalls like infinite loops, out-of-bounds memory access, and system crashes.
33
35
-**Flexibility**: The logic is in the BPF program, which can be developed and updated independently of the kernel module that defines the `struct_ops` interface.
This tutorial focuses on kernel-side GPU driver tracing, which provides visibility into job scheduling, memory management, and driver-firmware communication. However, kernel tracepoints have fundamental limitations. When `drm_run_job` fires, we know a job started executing on GPU hardware, but we cannot observe what happens inside the GPU itself. The execution of thousands of parallel threads, their memory access patterns, branch divergence, warp occupancy, and instruction-level behavior remain invisible. These details are critical for understanding performance bottlenecks - whether memory coalescing is failing, whether thread divergence is killing efficiency, or whether shared memory bank conflicts are stalling execution.
396
396
397
-
To achieve fine-grained GPU observability, eBPF programs must run directly on the GPU. This is the direction explored by the eGPU paper and [bpftime GPU examples](https://github.com/eunomia-bpf/bpftime/tree/master/example/gpu). bpftime converts eBPF bytecode to PTX instructions that GPUs can execute, then dynamically patches CUDA binaries at runtime to inject these eBPF programs at kernel entry/exit points. This enables observing GPU-specific information like block indices, thread indices, global timers, and warp-level metrics. Developers can instrument critical paths inside GPU kernels to measure execution behavior and diagnose complex performance issues that kernel-side tracing cannot reach. This GPU-internal observability complements kernel tracepoints - together they provide end-to-end visibility from API calls through kernel drivers to GPU execution.
397
+
To achieve fine-grained GPU observability, eBPF programs must run directly on the GPU. This is the direction explored by the eGPU paper and [bpftime GPU examples](https://github.com/eunomia-bpf/bpftime/tree/master/example/gpu). bpftime converts eBPF bytecode to PTX instructions that GPUs can execute, then dynamically patches CUDA binaries at runtime to inject these eBPF programs at kernel entry/exit points. This enables observing GPU-specific information like block indices, thread indices, global timers, and warp-level metrics. Developers can instrument critical paths inside GPU kernels to measure execution behavior and diagnose complex performance issues that kernel-side tracing cannot reach. This GPU-internal observability complements kernel tracepoints - together they provide end-to-end visibility from API calls through kernel drivers to GPU execution. Beyond tracing, eBPF can also extend GPU driver behavior—see our [gpu_ext project](https://github.com/eunomia-bpf/gpu_ext) for GPU scheduling and memory offloading via BPF struct_ops ([LPC 2024 talk](https://lpc.events/event/19/contributions/2168/)).
0 commit comments