-
Notifications
You must be signed in to change notification settings - Fork 156
mm: memcontrol: Add BPF hooks for memory controller #10853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mm: memcontrol: Add BPF hooks for memory controller #10853
Conversation
Move struct bpf_struct_ops_link's definition into bpf.h, where other custom bpf links definitions are. It's necessary to access its members from outside of generic bpf_struct_ops implementation, which will be done by following patches in the series. Signed-off-by: Roman Gushchin <[email protected]>
When a struct ops is being attached and a bpf link is created, allow to pass a cgroup fd using bpf attr, so that struct ops can be attached to a cgroup instead of globally. Attached struct ops doesn't hold a reference to the cgroup, only preserves cgroup id. Signed-off-by: Roman Gushchin <[email protected]>
Struct oom_control is used to describe the OOM context. It's memcg field defines the scope of OOM: it's NULL for global OOMs and a valid memcg pointer for memcg-scoped OOMs. Teach bpf verifier to recognize it as trusted or NULL pointer. It will provide the bpf OOM handler a trusted memcg pointer, which for example is required for iterating the memcg's subtree. Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Kumar Kartikeya Dwivedi <[email protected]>
mem_cgroup_get_from_ino() can be reused by the BPF OOM implementation, but currently depends on CONFIG_SHRINKER_DEBUG. Remove this dependency. Signed-off-by: Roman Gushchin <[email protected]>
Introduce bpf_map__attach_struct_ops_opts(), an extended version of bpf_map__attach_struct_ops(), which takes additional struct bpf_struct_ops_opts argument. struct bpf_struct_ops_opts has the relative_fd member, which allows to pass an additional file descriptor argument. It can be used to attach struct ops maps to cgroups. Signed-off-by: Roman Gushchin <[email protected]>
To support features like allowing overrides in cgroup hierarchies, we need a way to pass flags from userspace to the kernel when attaching a struct_ops. Extend `bpf_struct_ops_link` to include a `flags` field. This field is populated from `attr->link_create.flags` during link creation. This will allow struct_ops implementations, such as the upcoming memory controller ops, to interpret these flags and modify their attachment behavior accordingly. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
Building on the previous change that added flags to the kernel's link creation path, this patch exposes this functionality through libbpf. The `bpf_struct_ops_opts` struct is extended with a `flags` member, which is then passed to the `bpf_link_create` syscall within `bpf_map__attach_struct_ops_opts`. This enables userspace applications to pass flags, such as `BPF_F_ALLOW_OVERRIDE`, when attaching struct_ops to cgroups, providing more control over the attachment behavior in nested hierarchies. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
Introduce BPF struct_ops support to the memory controller, enabling custom and dynamic control over memory pressure. This is achieved through a new struct_ops type, `memcg_bpf_ops`. This new interface allows a BPF program to implement hooks that influence a memory cgroup's behavior. The `memcg_bpf_ops` struct provides the following hooks: - `get_high_delay_ms`: Returns a custom throttling delay in milliseconds for a cgroup that has breached its `memory.high` limit. This is the primary mechanism for BPF-driven throttling. - `below_low`: Overrides the `memory.low` protection check. If this hook returns true, the cgroup is considered to be protected by its `memory.low` setting, regardless of its actual usage. - `below_min`: Similar to `below_low`, this overrides the `memory.min` protection check. - `handle_cgroup_online`/`offline`: Callbacks invoked when a cgroup with an attached program comes online or goes offline, allowing for state management. This patch integrates these hooks into the core memory control logic. The `get_high_delay_ms` value is incorporated into charge paths like `try_charge_memcg` and the high-limit handler `__mem_cgroup_handle_over_high`. The `below_low` and `below_min` hooks are checked within their respective protection functions. Lifecycle management is handled to ensure BPF programs are correctly inherited by child cgroups and cleaned up on detachment. SRCU is used to protect concurrent access to the `memcg->bpf_ops` pointer. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
Add a comprehensive selftest suite for the `memcg_bpf_ops` functionality. These tests validate that BPF programs can correctly influence memory cgroup throttling behavior by implementing the new hooks. The test suite is added in `prog_tests/memcg_ops.c` and covers several key scenarios: 1. `test_memcg_ops_over_high`: Verifies that a BPF program can trigger throttling on a low-priority cgroup by returning a delay from the `get_high_delay_ms` hook when a high-priority cgroup is under pressure. 2. `test_memcg_ops_below_low_over_high`: Tests the combination of the `below_low` and `get_high_delay_ms` hooks, ensuring they work together as expected. 3. `test_memcg_ops_below_min_over_high`: Validates the interaction between the `below_min` and `get_high_delay_ms` hooks. The test framework sets up a cgroup hierarchy with high and low priority groups, attaches BPF programs, runs memory-intensive workloads, and asserts that the observed throttling (measured by workload execution time) matches expectations. The BPF program (`progs/memcg_ops.c`) uses a tracepoint on `memcg:count_memcg_events` (specifically PGFAULT) to detect memory pressure and trigger the appropriate hooks in response. This test suite provides essential validation for the new memory control mechanisms. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
To allow for more flexible attachment policies in nested cgroup hierarchies, this patch introduces support for the `BPF_F_ALLOW_OVERRIDE` flag for `memcg_bpf_ops`. When a `memcg_bpf_ops` is attached to a cgroup with this flag, it permits child cgroups to attach their own, different `memcg_bpf_ops`, overriding the parent's inherited program. Without this flag, attaching a BPF program to a cgroup that already has one (either directly or via inheritance) will fail. The implementation involves: - Adding a `bpf_ops_flags` field to `struct mem_cgroup`. - During registration (`bpf_memcg_ops_reg`), checking for existing programs and the `BPF_F_ALLOW_OVERRIDE` flag. - During unregistration (`bpf_memcg_ops_unreg`), correctly restoring the parent's BPF program to the cgroup hierarchy. - Ensuring flags are inherited by child cgroups during online events. This change enables complex, multi-level policy enforcement where different subtrees of the cgroup hierarchy can have distinct memory management BPF programs. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
Add a new selftest, `test_memcg_ops_hierarchies`, to validate the behavior of attaching `memcg_bpf_ops` in a nested cgroup hierarchy, specifically testing the `BPF_F_ALLOW_OVERRIDE` flag. The test case performs the following steps: 1. Creates a three-level deep cgroup hierarchy: `/cg`, `/cg/cg`, and `/cg/cg/cg`. 2. Attaches a BPF struct_ops to the top-level cgroup (`/cg`) with the `BPF_F_ALLOW_OVERRIDE` flag. 3. Successfully attaches a new struct_ops to the middle cgroup (`/cg/cg`) without the flag, overriding the inherited one. 4. Asserts that attaching another struct_ops to the deepest cgroup (`/cg/cg/cg`) fails with -EBUSY, because its parent did not specify `BPF_F_ALLOW_OVERRIDE`. This test ensures that the attachment logic correctly enforces the override rules across a cgroup subtree. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
Add a sample program to demonstrate a practical use case for the `memcg_bpf_ops` feature: priority-based memory throttling. The sample consists of a BPF program and a userspace loader: 1. memcg.bpf.c: A BPF program that monitors PGFAULT events on a high-priority cgroup. When activity exceeds a threshold, it uses the `get_high_delay_ms`, `below_low`, or `below_min` hooks to apply pressure on a low-priority cgroup. 2. memcg.c: A userspace loader that configures and attaches the BPF program. It takes command-line arguments for the high and low priority cgroup paths, a pressure threshold, and the desired throttling delay (`over_high_ms`). This provides a clear, working example of how to implement a dynamic, priority-aware memory management policy. A user can create two cgroups, run workloads of different priorities, and observe the low-priority workload being throttled to protect the high-priority one. Example usage: # ./memcg --low_path /sys/fs/cgroup/low \ # --high_path /sys/fs/cgroup/high \ # --threshold 100 --over_high_ms 1024 Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Hui Zhu <[email protected]>
|
Upstream branch: 8016abd |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: AI-authorship-score: low |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: AI-authorship-score: low |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: AI-authorship-score: low |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: AI-authorship-score: medium |
AI reviewed your patch. Please fix the bug or email reply why it's not a bug. In-Reply-To-Subject: AI-authorship-score: low |
|
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1047479 expired. Closing PR. |
Pull request for series with
subject: mm: memcontrol: Add BPF hooks for memory controller
version: 5
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1047479