|
| 1 | +.. SPDX-License-Identifier: GPL-2.0-only |
| 2 | +.. Copyright (C) 2020 Google LLC. |
| 3 | +
|
| 4 | +=========================== |
| 5 | +BPF_MAP_TYPE_CGROUP_STORAGE |
| 6 | +=========================== |
| 7 | + |
| 8 | +The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized |
| 9 | +storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that |
| 10 | +attach to cgroups; the programs are made available by the same Kconfig. The |
| 11 | +storage is identified by the cgroup the program is attached to. |
| 12 | + |
| 13 | +The map provide a local storage at the cgroup that the BPF program is attached |
| 14 | +to. It provides a faster and simpler access than the general purpose hash |
| 15 | +table, which performs a hash table lookups, and requires user to track live |
| 16 | +cgroups on their own. |
| 17 | + |
| 18 | +This document describes the usage and semantics of the |
| 19 | +``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in |
| 20 | +Linux 5.9 and this document will describe the differences. |
| 21 | + |
| 22 | +Usage |
| 23 | +===== |
| 24 | + |
| 25 | +The map uses key of type of either ``__u64 cgroup_inode_id`` or |
| 26 | +``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: |
| 27 | + |
| 28 | + struct bpf_cgroup_storage_key { |
| 29 | + __u64 cgroup_inode_id; |
| 30 | + __u32 attach_type; |
| 31 | + }; |
| 32 | + |
| 33 | +``cgroup_inode_id`` is the inode id of the cgroup directory. |
| 34 | +``attach_type`` is the the program's attach type. |
| 35 | + |
| 36 | +Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. |
| 37 | +When this key type is used, then all attach types of the particular cgroup and |
| 38 | +map will share the same storage. Otherwise, if the type is |
| 39 | +``struct bpf_cgroup_storage_key``, then programs of different attach types |
| 40 | +be isolated and see different storages. |
| 41 | + |
| 42 | +To access the storage in a program, use ``bpf_get_local_storage``:: |
| 43 | + |
| 44 | + void *bpf_get_local_storage(void *map, u64 flags) |
| 45 | + |
| 46 | +``flags`` is reserved for future use and must be 0. |
| 47 | + |
| 48 | +There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` |
| 49 | +can be accessed by multiple programs across different CPUs, and user should |
| 50 | +take care of synchronization by themselves. The bpf infrastructure provides |
| 51 | +``struct bpf_spin_lock`` to synchronize the storage. See |
| 52 | +``tools/testing/selftests/bpf/progs/test_spin_lock.c``. |
| 53 | + |
| 54 | +Examples |
| 55 | +======== |
| 56 | + |
| 57 | +Usage with key type as ``struct bpf_cgroup_storage_key``:: |
| 58 | + |
| 59 | + #include <bpf/bpf.h> |
| 60 | + |
| 61 | + struct { |
| 62 | + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); |
| 63 | + __type(key, struct bpf_cgroup_storage_key); |
| 64 | + __type(value, __u32); |
| 65 | + } cgroup_storage SEC(".maps"); |
| 66 | + |
| 67 | + int program(struct __sk_buff *skb) |
| 68 | + { |
| 69 | + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); |
| 70 | + __sync_fetch_and_add(ptr, 1); |
| 71 | + |
| 72 | + return 0; |
| 73 | + } |
| 74 | + |
| 75 | +Userspace accessing map declared above:: |
| 76 | + |
| 77 | + #include <linux/bpf.h> |
| 78 | + #include <linux/libbpf.h> |
| 79 | + |
| 80 | + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) |
| 81 | + { |
| 82 | + struct bpf_cgroup_storage_key = { |
| 83 | + .cgroup_inode_id = cgrp, |
| 84 | + .attach_type = type, |
| 85 | + }; |
| 86 | + __u32 value; |
| 87 | + bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); |
| 88 | + // error checking omitted |
| 89 | + return value; |
| 90 | + } |
| 91 | + |
| 92 | +Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: |
| 93 | + |
| 94 | + #include <bpf/bpf.h> |
| 95 | + |
| 96 | + struct { |
| 97 | + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); |
| 98 | + __type(key, __u64); |
| 99 | + __type(value, __u32); |
| 100 | + } cgroup_storage SEC(".maps"); |
| 101 | + |
| 102 | + int program(struct __sk_buff *skb) |
| 103 | + { |
| 104 | + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); |
| 105 | + __sync_fetch_and_add(ptr, 1); |
| 106 | + |
| 107 | + return 0; |
| 108 | + } |
| 109 | + |
| 110 | +And userspace:: |
| 111 | + |
| 112 | + #include <linux/bpf.h> |
| 113 | + #include <linux/libbpf.h> |
| 114 | + |
| 115 | + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) |
| 116 | + { |
| 117 | + __u32 value; |
| 118 | + bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); |
| 119 | + // error checking omitted |
| 120 | + return value; |
| 121 | + } |
| 122 | + |
| 123 | +Semantics |
| 124 | +========= |
| 125 | + |
| 126 | +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This |
| 127 | +per-CPU variant will have different memory regions for each CPU for each |
| 128 | +storage. The non-per-CPU will have the same memory region for each storage. |
| 129 | + |
| 130 | +Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and |
| 131 | +for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded |
| 132 | +that uses the map. A program may be attached to multiple cgroups or have |
| 133 | +multiple attach types, and each attach creates a fresh zeroed storage. The |
| 134 | +storage is freed upon detach. |
| 135 | + |
| 136 | +There is a one-to-one association between the map of each type (per-CPU and |
| 137 | +non-per-CPU) and the BPF program during load verification time. As a result, |
| 138 | +each map can only be used by one BPF program and each BPF program can only use |
| 139 | +one storage map of each type. Because of map can only be used by one BPF |
| 140 | +program, sharing of this cgroup's storage with other BPF programs were |
| 141 | +impossible. |
| 142 | + |
| 143 | +Since Linux 5.9, storage can be shared by multiple programs. When a program is |
| 144 | +attached to a cgroup, the kernel would create a new storage only if the map |
| 145 | +does not already contain an entry for the cgroup and attach type pair, or else |
| 146 | +the old storage is reused for the new attachment. If the map is attach type |
| 147 | +shared, then attach type is simply ignored during comparison. Storage is freed |
| 148 | +only when either the map or the cgroup attached to is being freed. Detaching |
| 149 | +will not directly free the storage, but it may cause the reference to the map |
| 150 | +to reach zero and indirectly freeing all storage in the map. |
| 151 | + |
| 152 | +The map is not associated with any BPF program, thus making sharing possible. |
| 153 | +However, the BPF program can still only associate with one map of each type |
| 154 | +(per-CPU and non-per-CPU). A BPF program cannot use more than one |
| 155 | +``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one |
| 156 | +``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. |
| 157 | + |
| 158 | +In all versions, userspace may use the the attach parameters of cgroup and |
| 159 | +attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map |
| 160 | +APIs to read or update the storage for a given attachment. For Linux 5.9 |
| 161 | +attach type shared storages, only the first value in the struct, cgroup inode |
| 162 | +id, is used during comparison, so userspace may just specify a ``__u64`` |
| 163 | +directly. |
| 164 | + |
| 165 | +The storage is bound at attach time. Even if the program is attached to parent |
| 166 | +and triggers in child, the storage still belongs to the parent. |
| 167 | + |
| 168 | +Userspace cannot create a new entry in the map or delete an existing entry. |
| 169 | +Program test runs always use a temporary storage. |
0 commit comments