Skip to content

Commit 5d9fb42

Browse files
committed
Merge branch 'support-associating-bpf-programs-with-struct_ops'
Amery Hung says: ==================== Support associating BPF programs with struct_ops Hi, This patchset adds a new BPF command BPF_PROG_ASSOC_STRUCT_OPS to the bpf() syscall to allow associating a BPF program with a struct_ops. The command is introduced to address a emerging need from struct_ops users. As the number of subsystems adopting struct_ops grows, more users are building their struct_ops-based solution with some help from other BPF programs. For example, scx_layer uses a syscall program as a user space trigger to refresh layers [0]. It also uses tracing program to infer whether a task is using GPU and needs to be prioritized [1]. In these use cases, when there are multiple struct_ops instances, the struct_ops kfuncs called from different BPF programs, whether struct_ops or not needs to be able to refer to a specific one, which currently is not possible. The new BPF command will allow users to explicitly associate a BPF program with a struct_ops map. The libbpf wrapper can be called after loading programs and before attaching programs and struct_ops. Internally, it will set prog->aux->st_ops_assoc to the struct_ops map. struct_ops kfuncs can then get the associated struct_ops struct by calling bpf_prog_get_assoc_struct_ops() with prog->aux, which can be acquired from a "__prog" argument. The value of the special argument will be fixed up by the verifier during verification. The command conceptually associates the implementation of BPF programs with struct_ops map, not the attachment. A program associated with the map will take a refcount of it so that st_ops_assoc always points to a valid struct_ops struct. struct_ops implementers can use the helper, bpf_prog_get_assoc_struct_ops to get the pointer. The returned struct_ops if not NULL is guaranteed to be valid and initialized. However, it is not guaranteed that the struct_ops is attached. The struct_ops implementer still need to take steps to track and check the state of the struct_ops in kdata, if the use case demand the struct_ops to be attached. We can also consider support associating struct_ops link with BPF programs, which on one hand make struct_ops implementer's job easier, but might complicate libbpf workflow and does not apply to legacy struct_ops attachment. [0] https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L557 [1] https://github.com/sched-ext/scx/blob/main/scheds/rust/scx_layered/src/bpf/main.bpf.c#L754 --- v7 -> v8 - Fix libbpf return (Andrii) - Follow kfunc _impl suffic naming convention in selftest (Alexei) Link: https://lore.kernel.org/bpf/[email protected]/ v6 -> v7 - Drop the guarantee that bpf_prog_get_assoc_struct_ops() will always return an initialized struct_ops (Martin) - Minor misc. changes in selftests Link: https://lore.kernel.org/bpf/[email protected]/ v5 -> v6 - Drop refcnt bumping for async callbacks and add RCU annotation (Martin) - Fix libbpf bug and update comments (Andrii) - Fix refcount bug in bpf_prog_assoc_struct_ops() (AI) Link: https://lore.kernel.org/bpf/[email protected]/ v4 -> v5 - Simplify the API for getting associated struct_ops and dont't expose struct_ops map lifecycle management (Andrii, Alexei) Link: https://lore.kernel.org/bpf/[email protected]/ v3 -> v4 - Fix potential dangling pointer in timer callback. Protect st_ops_assoc with RCU. The get helper now needs to be paired with bpf_struct_ops_put() - The command should only increase refcount once for a program (Andrii) - Test a struct_ops program reused in two struct_ops maps - Test getting associated struct_ops in timer callback Link: https://lore.kernel.org/bpf/[email protected]/ v2 -> v3 - Change the type of st_ops_assoc from void* (i.e., kdata) to bpf_map (Andrii) - Fix a bug that clears BPF_PTR_POISON when a struct_ops map is freed (Andrii) - Return NULL if the map is not fully initialized (Martin) - Move struct_ops map refcount inc/dec into internal helpers (Martin) - Add libbpf API, bpf_program__assoc_struct_ops (Andrii) Link: https://lore.kernel.org/bpf/[email protected]/ v1 -> v2 - Poison st_ops_assoc when reusing the program in more than one struct_ops maps and add a helper to access the pointer (Andrii) - Minor style and naming changes (Andrii) Link: https://lore.kernel.org/bpf/[email protected]/ --- ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Andrii Nakryiko <[email protected]>
2 parents 81f88f6 + 0e841d1 commit 5d9fb42

File tree

18 files changed

+743
-2
lines changed

18 files changed

+743
-2
lines changed

include/linux/bpf.h

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1739,6 +1739,8 @@ struct bpf_prog_aux {
17391739
struct rcu_head rcu;
17401740
};
17411741
struct bpf_stream stream[2];
1742+
struct mutex st_ops_assoc_mutex;
1743+
struct bpf_map __rcu *st_ops_assoc;
17421744
};
17431745

17441746
struct bpf_prog {
@@ -2041,6 +2043,9 @@ static inline void bpf_module_put(const void *data, struct module *owner)
20412043
module_put(owner);
20422044
}
20432045
int bpf_struct_ops_link_create(union bpf_attr *attr);
2046+
int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map);
2047+
void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog);
2048+
void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux);
20442049
u32 bpf_struct_ops_id(const void *kdata);
20452050

20462051
#ifdef CONFIG_NET
@@ -2088,6 +2093,17 @@ static inline int bpf_struct_ops_link_create(union bpf_attr *attr)
20882093
{
20892094
return -EOPNOTSUPP;
20902095
}
2096+
static inline int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map)
2097+
{
2098+
return -EOPNOTSUPP;
2099+
}
2100+
static inline void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog)
2101+
{
2102+
}
2103+
static inline void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux)
2104+
{
2105+
return NULL;
2106+
}
20912107
static inline void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
20922108
{
20932109
}

include/uapi/linux/bpf.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -918,6 +918,16 @@ union bpf_iter_link_info {
918918
* Number of bytes read from the stream on success, or -1 if an
919919
* error occurred (in which case, *errno* is set appropriately).
920920
*
921+
* BPF_PROG_ASSOC_STRUCT_OPS
922+
* Description
923+
* Associate a BPF program with a struct_ops map. The struct_ops
924+
* map is identified by *map_fd* and the BPF program is
925+
* identified by *prog_fd*.
926+
*
927+
* Return
928+
* 0 on success or -1 if an error occurred (in which case,
929+
* *errno* is set appropriately).
930+
*
921931
* NOTES
922932
* eBPF objects (maps and programs) can be shared between processes.
923933
*
@@ -974,6 +984,7 @@ enum bpf_cmd {
974984
BPF_PROG_BIND_MAP,
975985
BPF_TOKEN_CREATE,
976986
BPF_PROG_STREAM_READ_BY_FD,
987+
BPF_PROG_ASSOC_STRUCT_OPS,
977988
__MAX_BPF_CMD,
978989
};
979990

@@ -1894,6 +1905,12 @@ union bpf_attr {
18941905
__u32 prog_fd;
18951906
} prog_stream_read;
18961907

1908+
struct {
1909+
__u32 map_fd;
1910+
__u32 prog_fd;
1911+
__u32 flags;
1912+
} prog_assoc_struct_ops;
1913+
18971914
} __attribute__((aligned(8)));
18981915

18991916
/* The description below is an attempt at providing documentation to eBPF

kernel/bpf/bpf_struct_ops.c

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -533,6 +533,17 @@ static void bpf_struct_ops_map_put_progs(struct bpf_struct_ops_map *st_map)
533533
}
534534
}
535535

536+
static void bpf_struct_ops_map_dissoc_progs(struct bpf_struct_ops_map *st_map)
537+
{
538+
u32 i;
539+
540+
for (i = 0; i < st_map->funcs_cnt; i++) {
541+
if (!st_map->links[i])
542+
break;
543+
bpf_prog_disassoc_struct_ops(st_map->links[i]->prog);
544+
}
545+
}
546+
536547
static void bpf_struct_ops_map_free_image(struct bpf_struct_ops_map *st_map)
537548
{
538549
int i;
@@ -801,6 +812,9 @@ static long bpf_struct_ops_map_update_elem(struct bpf_map *map, void *key,
801812
goto reset_unlock;
802813
}
803814

815+
/* Poison pointer on error instead of return for backward compatibility */
816+
bpf_prog_assoc_struct_ops(prog, &st_map->map);
817+
804818
link = kzalloc(sizeof(*link), GFP_USER);
805819
if (!link) {
806820
bpf_prog_put(prog);
@@ -980,6 +994,8 @@ static void bpf_struct_ops_map_free(struct bpf_map *map)
980994
if (btf_is_module(st_map->btf))
981995
module_put(st_map->st_ops_desc->st_ops->owner);
982996

997+
bpf_struct_ops_map_dissoc_progs(st_map);
998+
983999
bpf_struct_ops_map_del_ksyms(st_map);
9841000

9851001
/* The struct_ops's function may switch to another struct_ops.
@@ -1396,6 +1412,78 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
13961412
return err;
13971413
}
13981414

1415+
int bpf_prog_assoc_struct_ops(struct bpf_prog *prog, struct bpf_map *map)
1416+
{
1417+
struct bpf_map *st_ops_assoc;
1418+
1419+
guard(mutex)(&prog->aux->st_ops_assoc_mutex);
1420+
1421+
st_ops_assoc = rcu_dereference_protected(prog->aux->st_ops_assoc,
1422+
lockdep_is_held(&prog->aux->st_ops_assoc_mutex));
1423+
if (st_ops_assoc && st_ops_assoc == map)
1424+
return 0;
1425+
1426+
if (st_ops_assoc) {
1427+
if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
1428+
return -EBUSY;
1429+
1430+
rcu_assign_pointer(prog->aux->st_ops_assoc, BPF_PTR_POISON);
1431+
} else {
1432+
/*
1433+
* struct_ops map does not track associated non-struct_ops programs.
1434+
* Bump the refcount to make sure st_ops_assoc is always valid.
1435+
*/
1436+
if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
1437+
bpf_map_inc(map);
1438+
1439+
rcu_assign_pointer(prog->aux->st_ops_assoc, map);
1440+
}
1441+
1442+
return 0;
1443+
}
1444+
1445+
void bpf_prog_disassoc_struct_ops(struct bpf_prog *prog)
1446+
{
1447+
struct bpf_map *st_ops_assoc;
1448+
1449+
guard(mutex)(&prog->aux->st_ops_assoc_mutex);
1450+
1451+
st_ops_assoc = rcu_dereference_protected(prog->aux->st_ops_assoc,
1452+
lockdep_is_held(&prog->aux->st_ops_assoc_mutex));
1453+
if (!st_ops_assoc || st_ops_assoc == BPF_PTR_POISON)
1454+
return;
1455+
1456+
if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
1457+
bpf_map_put(st_ops_assoc);
1458+
1459+
RCU_INIT_POINTER(prog->aux->st_ops_assoc, NULL);
1460+
}
1461+
1462+
/*
1463+
* Get a reference to the struct_ops struct (i.e., kdata) associated with a
1464+
* program. Should only be called in BPF program context (e.g., in a kfunc).
1465+
*
1466+
* If the returned pointer is not NULL, it must points to a valid struct_ops.
1467+
* The struct_ops map is not guaranteed to be initialized nor attached.
1468+
* Kernel struct_ops implementers are responsible for tracking and checking
1469+
* the state of the struct_ops if the use case requires an initialized or
1470+
* attached struct_ops.
1471+
*/
1472+
void *bpf_prog_get_assoc_struct_ops(const struct bpf_prog_aux *aux)
1473+
{
1474+
struct bpf_struct_ops_map *st_map;
1475+
struct bpf_map *st_ops_assoc;
1476+
1477+
st_ops_assoc = rcu_dereference_check(aux->st_ops_assoc, bpf_rcu_lock_held());
1478+
if (!st_ops_assoc || st_ops_assoc == BPF_PTR_POISON)
1479+
return NULL;
1480+
1481+
st_map = (struct bpf_struct_ops_map *)st_ops_assoc;
1482+
1483+
return &st_map->kvalue.data;
1484+
}
1485+
EXPORT_SYMBOL_GPL(bpf_prog_get_assoc_struct_ops);
1486+
13991487
void bpf_map_struct_ops_info_fill(struct bpf_map_info *info, struct bpf_map *map)
14001488
{
14011489
struct bpf_struct_ops_map *st_map = (struct bpf_struct_ops_map *)map;

kernel/bpf/core.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
136136
mutex_init(&fp->aux->used_maps_mutex);
137137
mutex_init(&fp->aux->ext_mutex);
138138
mutex_init(&fp->aux->dst_mutex);
139+
mutex_init(&fp->aux->st_ops_assoc_mutex);
139140

140141
#ifdef CONFIG_BPF_SYSCALL
141142
bpf_prog_stream_init(fp);
@@ -286,6 +287,7 @@ void __bpf_prog_free(struct bpf_prog *fp)
286287
if (fp->aux) {
287288
mutex_destroy(&fp->aux->used_maps_mutex);
288289
mutex_destroy(&fp->aux->dst_mutex);
290+
mutex_destroy(&fp->aux->st_ops_assoc_mutex);
289291
kfree(fp->aux->poke_tab);
290292
kfree(fp->aux);
291293
}
@@ -2896,6 +2898,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
28962898
#endif
28972899
bpf_free_used_maps(aux);
28982900
bpf_free_used_btfs(aux);
2901+
bpf_prog_disassoc_struct_ops(aux->prog);
28992902
if (bpf_prog_is_dev_bound(aux))
29002903
bpf_prog_dev_bound_destroy(aux->prog);
29012904
#ifdef CONFIG_PERF_EVENTS

kernel/bpf/syscall.c

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6122,6 +6122,49 @@ static int prog_stream_read(union bpf_attr *attr)
61226122
return ret;
61236123
}
61246124

6125+
#define BPF_PROG_ASSOC_STRUCT_OPS_LAST_FIELD prog_assoc_struct_ops.prog_fd
6126+
6127+
static int prog_assoc_struct_ops(union bpf_attr *attr)
6128+
{
6129+
struct bpf_prog *prog;
6130+
struct bpf_map *map;
6131+
int ret;
6132+
6133+
if (CHECK_ATTR(BPF_PROG_ASSOC_STRUCT_OPS))
6134+
return -EINVAL;
6135+
6136+
if (attr->prog_assoc_struct_ops.flags)
6137+
return -EINVAL;
6138+
6139+
prog = bpf_prog_get(attr->prog_assoc_struct_ops.prog_fd);
6140+
if (IS_ERR(prog))
6141+
return PTR_ERR(prog);
6142+
6143+
if (prog->type == BPF_PROG_TYPE_STRUCT_OPS) {
6144+
ret = -EINVAL;
6145+
goto put_prog;
6146+
}
6147+
6148+
map = bpf_map_get(attr->prog_assoc_struct_ops.map_fd);
6149+
if (IS_ERR(map)) {
6150+
ret = PTR_ERR(map);
6151+
goto put_prog;
6152+
}
6153+
6154+
if (map->map_type != BPF_MAP_TYPE_STRUCT_OPS) {
6155+
ret = -EINVAL;
6156+
goto put_map;
6157+
}
6158+
6159+
ret = bpf_prog_assoc_struct_ops(prog, map);
6160+
6161+
put_map:
6162+
bpf_map_put(map);
6163+
put_prog:
6164+
bpf_prog_put(prog);
6165+
return ret;
6166+
}
6167+
61256168
static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
61266169
{
61276170
union bpf_attr attr;
@@ -6261,6 +6304,9 @@ static int __sys_bpf(enum bpf_cmd cmd, bpfptr_t uattr, unsigned int size)
62616304
case BPF_PROG_STREAM_READ_BY_FD:
62626305
err = prog_stream_read(&attr);
62636306
break;
6307+
case BPF_PROG_ASSOC_STRUCT_OPS:
6308+
err = prog_assoc_struct_ops(&attr);
6309+
break;
62646310
default:
62656311
err = -EINVAL;
62666312
break;

kernel/bpf/verifier.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22493,8 +22493,7 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
2249322493

2249422494
if (!bpf_jit_supports_far_kfunc_call())
2249522495
insn->imm = BPF_CALL_IMM(desc->addr);
22496-
if (insn->off)
22497-
return 0;
22496+
2249822497
if (desc->func_id == special_kfunc_list[KF_bpf_obj_new_impl] ||
2249922498
desc->func_id == special_kfunc_list[KF_bpf_percpu_obj_new_impl]) {
2250022499
struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;

tools/include/uapi/linux/bpf.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -918,6 +918,16 @@ union bpf_iter_link_info {
918918
* Number of bytes read from the stream on success, or -1 if an
919919
* error occurred (in which case, *errno* is set appropriately).
920920
*
921+
* BPF_PROG_ASSOC_STRUCT_OPS
922+
* Description
923+
* Associate a BPF program with a struct_ops map. The struct_ops
924+
* map is identified by *map_fd* and the BPF program is
925+
* identified by *prog_fd*.
926+
*
927+
* Return
928+
* 0 on success or -1 if an error occurred (in which case,
929+
* *errno* is set appropriately).
930+
*
921931
* NOTES
922932
* eBPF objects (maps and programs) can be shared between processes.
923933
*
@@ -974,6 +984,7 @@ enum bpf_cmd {
974984
BPF_PROG_BIND_MAP,
975985
BPF_TOKEN_CREATE,
976986
BPF_PROG_STREAM_READ_BY_FD,
987+
BPF_PROG_ASSOC_STRUCT_OPS,
977988
__MAX_BPF_CMD,
978989
};
979990

@@ -1894,6 +1905,12 @@ union bpf_attr {
18941905
__u32 prog_fd;
18951906
} prog_stream_read;
18961907

1908+
struct {
1909+
__u32 map_fd;
1910+
__u32 prog_fd;
1911+
__u32 flags;
1912+
} prog_assoc_struct_ops;
1913+
18971914
} __attribute__((aligned(8)));
18981915

18991916
/* The description below is an attempt at providing documentation to eBPF

tools/lib/bpf/bpf.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1397,3 +1397,22 @@ int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *buf, __u32 buf_len,
13971397
err = sys_bpf(BPF_PROG_STREAM_READ_BY_FD, &attr, attr_sz);
13981398
return libbpf_err_errno(err);
13991399
}
1400+
1401+
int bpf_prog_assoc_struct_ops(int prog_fd, int map_fd,
1402+
struct bpf_prog_assoc_struct_ops_opts *opts)
1403+
{
1404+
const size_t attr_sz = offsetofend(union bpf_attr, prog_assoc_struct_ops);
1405+
union bpf_attr attr;
1406+
int err;
1407+
1408+
if (!OPTS_VALID(opts, bpf_prog_assoc_struct_ops_opts))
1409+
return libbpf_err(-EINVAL);
1410+
1411+
memset(&attr, 0, attr_sz);
1412+
attr.prog_assoc_struct_ops.map_fd = map_fd;
1413+
attr.prog_assoc_struct_ops.prog_fd = prog_fd;
1414+
attr.prog_assoc_struct_ops.flags = OPTS_GET(opts, flags, 0);
1415+
1416+
err = sys_bpf(BPF_PROG_ASSOC_STRUCT_OPS, &attr, attr_sz);
1417+
return libbpf_err_errno(err);
1418+
}

tools/lib/bpf/bpf.h

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -733,6 +733,27 @@ struct bpf_prog_stream_read_opts {
733733
LIBBPF_API int bpf_prog_stream_read(int prog_fd, __u32 stream_id, void *buf, __u32 buf_len,
734734
struct bpf_prog_stream_read_opts *opts);
735735

736+
struct bpf_prog_assoc_struct_ops_opts {
737+
size_t sz;
738+
__u32 flags;
739+
size_t :0;
740+
};
741+
#define bpf_prog_assoc_struct_ops_opts__last_field flags
742+
743+
/**
744+
* @brief **bpf_prog_assoc_struct_ops** associates a BPF program with a
745+
* struct_ops map.
746+
*
747+
* @param prog_fd FD for the BPF program
748+
* @param map_fd FD for the struct_ops map to be associated with the BPF program
749+
* @param opts optional options, can be NULL
750+
*
751+
* @return 0 on success; negative error code, otherwise (errno is also set to
752+
* the error code)
753+
*/
754+
LIBBPF_API int bpf_prog_assoc_struct_ops(int prog_fd, int map_fd,
755+
struct bpf_prog_assoc_struct_ops_opts *opts);
756+
736757
#ifdef __cplusplus
737758
} /* extern "C" */
738759
#endif

0 commit comments

Comments
 (0)