Skip to content

Commit 9536e2c

Browse files
ameryhungKernel Patches Daemon
authored andcommitted
selftests/bpf: Introduce task local data
Task local data defines an abstract storage type for storing task- specific data (TLD). This patch provides user space and bpf implementation as header-only libraries for accessing task local data. Task local data is a bpf task local storage map with two UPTRs: 1) u_tld_metadata, shared by all tasks of the same process, consists of the total count of TLDs and an array of metadata of TLDs. A metadata of a TLD comprises the size and the name. The name is used to identify a specific TLD in bpf 2) u_tld_data points to a task-specific memory region for storing TLDs. Below are the core task local data API: User space BPF Define TLD TLD_DEFINE_KEY(), tld_create_key() - Get data tld_get_data() tld_get_data() A TLD is first defined by the user space with TLD_DEFINE_KEY() or tld_create_key(). TLD_DEFINE_KEY() defines a TLD statically and allocates just enough memory during initialization. tld_create_key() allows creating TLDs on the fly, but has a fix memory budget, TLD_DYN_DATA_SIZE. Internally, they all go through the metadata array to check if the TLD can be added. The total TLD size needs to fit into a page (limited by UPTR), and no two TLDs can have the same name. It also calculates the offset, the next available space in u_tld_data, by summing sizes of TLDs. If the TLD can be added, it increases the count using cmpxchg as there may be other concurrent tld_create_key(). After a successful cmpxchg, the last metadata slot now belongs to the calling thread and will be updated. tld_create_key() returns the offset encapsulated as a opaque object key to prevent user misuse. Then, user space can pass the key to tld_get_data() to get a pointer to the TLD. The pointer will remain valid for the lifetime of the thread. BPF programs can also locate the TLD by tld_get_data(), but with both name and key. The first time tld_get_data() is called, the name will be used to lookup the metadata. Then, the key will be saved to a task_local_data map, tld_keys_map. Subsequent call to tld_get_data() will use the key to quickly locate the data. User space task local data library uses a light way approach to ensure thread safety (i.e., atomic operation + compiler and memory barriers). While a metadata is being updated, other threads may also try to read it. To prevent them from seeing incomplete data, metadata::size is used to signal the completion of the update, where 0 means the update is still ongoing. Threads will wait until seeing a non-zero size to read a metadata. Signed-off-by: Amery Hung <[email protected]> Reviewed-by: Emil Tsalapatis <[email protected]>
1 parent 1661322 commit 9536e2c

File tree

2 files changed

+615
-0
lines changed

2 files changed

+615
-0
lines changed
Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef __TASK_LOCAL_DATA_H
3+
#define __TASK_LOCAL_DATA_H
4+
5+
#include <errno.h>
6+
#include <fcntl.h>
7+
#include <sched.h>
8+
#include <stdatomic.h>
9+
#include <stddef.h>
10+
#include <stdlib.h>
11+
#include <string.h>
12+
#include <unistd.h>
13+
#include <sys/syscall.h>
14+
#include <sys/types.h>
15+
16+
#ifdef TLD_FREE_DATA_ON_THREAD_EXIT
17+
#include <pthread.h>
18+
#endif
19+
20+
#include <bpf/bpf.h>
21+
22+
/*
23+
* OPTIONS
24+
*
25+
* Define the option before including the header
26+
*
27+
* TLD_FREE_DATA_ON_THREAD_EXIT - Frees memory on thread exit automatically
28+
*
29+
* Thread-specific memory for storing TLD is allocated lazily on the first call to
30+
* tld_get_data(). The thread that calls it must also calls tld_free() on thread exit
31+
* to prevent memory leak. Pthread will be included if the option is defined. A pthread
32+
* key will be registered with a destructor that calls tld_free().
33+
*
34+
*
35+
* TLD_DYN_DATA_SIZE - The maximum size of memory allocated for TLDs created dynamically
36+
* (default: 64 bytes)
37+
*
38+
* A TLD can be defined statically using TLD_DEFINE_KEY() or created on the fly using
39+
* tld_create_key(). As the total size of TLDs created with tld_create_key() cannot be
40+
* possibly known statically, a memory area of size TLD_DYN_DATA_SIZE will be allocated
41+
* for these TLDs. This additional memory is allocated for every thread that calls
42+
* tld_get_data() even if no tld_create_key are actually called, so be mindful of
43+
* potential memory wastage. Use TLD_DEFINE_KEY() whenever possible as just enough memory
44+
* will be allocated for TLDs created with it.
45+
*
46+
*
47+
* TLD_NAME_LEN - The maximum length of the name of a TLD (default: 62)
48+
*
49+
* Setting TLD_NAME_LEN will affect the maximum number of TLDs a process can store,
50+
* TLD_MAX_DATA_CNT.
51+
*
52+
*
53+
* TLD_DATA_USE_ALIGNED_ALLOC - Always use aligned_alloc() instead of malloc()
54+
*
55+
* When allocating the memory for storing TLDs, we need to make sure there is a memory
56+
* region of the X bytes within a page. This is due to the limit posed by UPTR: memory
57+
* pinned to the kernel cannot exceed a page nor can it cross the page boundary. The
58+
* library normally calls malloc(2*X) given X bytes of total TLDs, and only uses
59+
* aligned_alloc(PAGE_SIZE, X) when X >= PAGE_SIZE / 2. This is to reduce memory wastage
60+
* as not all memory allocator can use the exact amount of memory requested to fulfill
61+
* aligned_alloc(). For example, some may round the size up to the alignment. Enable the
62+
* option to always use aligned_alloc() if the implementation has low memory overhead.
63+
*/
64+
65+
#define TLD_PIDFD_THREAD O_EXCL
66+
67+
#define TLD_PAGE_SIZE getpagesize()
68+
#define TLD_PAGE_MASK (~(TLD_PAGE_SIZE - 1))
69+
70+
#define TLD_ROUND_MASK(x, y) ((__typeof__(x))((y) - 1))
71+
#define TLD_ROUND_UP(x, y) ((((x) - 1) | TLD_ROUND_MASK(x, y)) + 1)
72+
73+
#define TLD_READ_ONCE(x) (*(volatile typeof(x) *)&(x))
74+
75+
#ifndef TLD_DYN_DATA_SIZE
76+
#define TLD_DYN_DATA_SIZE 64
77+
#endif
78+
79+
#define TLD_MAX_DATA_CNT (TLD_PAGE_SIZE / sizeof(struct tld_metadata) - 1)
80+
81+
#ifndef TLD_NAME_LEN
82+
#define TLD_NAME_LEN 62
83+
#endif
84+
85+
#ifdef __cplusplus
86+
extern "C" {
87+
#endif
88+
89+
typedef struct {
90+
__s16 off;
91+
} tld_key_t;
92+
93+
struct tld_metadata {
94+
char name[TLD_NAME_LEN];
95+
_Atomic __u16 size;
96+
};
97+
98+
struct u_tld_metadata {
99+
_Atomic __u8 cnt;
100+
__u16 size;
101+
struct tld_metadata metadata[];
102+
};
103+
104+
struct u_tld_data {
105+
__u64 start; /* offset of u_tld_data->data in a page */
106+
char data[];
107+
};
108+
109+
struct tld_map_value {
110+
void *data;
111+
struct u_tld_metadata *metadata;
112+
};
113+
114+
struct u_tld_metadata * _Atomic tld_metadata_p __attribute__((weak));
115+
__thread struct u_tld_data *tld_data_p __attribute__((weak));
116+
__thread void *tld_data_alloc_p __attribute__((weak));
117+
118+
#ifdef TLD_FREE_DATA_ON_THREAD_EXIT
119+
pthread_key_t tld_pthread_key __attribute__((weak));
120+
121+
static void tld_free(void);
122+
123+
static void __tld_thread_exit_handler(void *unused)
124+
{
125+
tld_free();
126+
}
127+
#endif
128+
129+
static int __tld_init_metadata(void)
130+
{
131+
struct u_tld_metadata *meta, *uninit = NULL;
132+
int err = 0;
133+
134+
meta = (struct u_tld_metadata *)aligned_alloc(TLD_PAGE_SIZE, TLD_PAGE_SIZE);
135+
if (!meta) {
136+
err = -ENOMEM;
137+
goto out;
138+
}
139+
140+
memset(meta, 0, TLD_PAGE_SIZE);
141+
meta->size = TLD_DYN_DATA_SIZE;
142+
143+
if (!atomic_compare_exchange_strong(&tld_metadata_p, &uninit, meta)) {
144+
free(meta);
145+
goto out;
146+
}
147+
148+
#ifdef TLD_FREE_DATA_ON_THREAD_EXIT
149+
pthread_key_create(&tld_pthread_key, __tld_thread_exit_handler);
150+
#endif
151+
out:
152+
return err;
153+
}
154+
155+
static int __tld_init_data(int map_fd)
156+
{
157+
bool use_aligned_alloc = false;
158+
struct tld_map_value map_val;
159+
struct u_tld_data *data;
160+
int err, tid_fd = -1;
161+
void *d = NULL;
162+
163+
tid_fd = syscall(SYS_pidfd_open, gettid(), TLD_PIDFD_THREAD);
164+
if (tid_fd < 0) {
165+
err = -errno;
166+
goto out;
167+
}
168+
169+
#ifdef TLD_DATA_USE_ALIGNED_ALLOC
170+
use_aligned_alloc = true;
171+
#endif
172+
173+
/*
174+
* tld_metadata_p->size = TLD_DYN_DATA_SIZE +
175+
* total size of TLDs defined via TLD_DEFINE_KEY()
176+
*/
177+
if (use_aligned_alloc || tld_metadata_p->size >= TLD_PAGE_SIZE / 2)
178+
d = aligned_alloc(TLD_PAGE_SIZE, tld_metadata_p->size);
179+
else
180+
d = malloc(tld_metadata_p->size * 2);
181+
if (!d) {
182+
err = -ENOMEM;
183+
goto out;
184+
}
185+
186+
/*
187+
* Always pass a page-aligned address to UPTR since the size of tld_map_value::data
188+
* is a page in BTF. If d spans across two pages, use the page that contains large
189+
* enough memory.
190+
*/
191+
if (TLD_PAGE_SIZE - (~TLD_PAGE_MASK & (intptr_t)d) >= tld_metadata_p->size) {
192+
map_val.data = (void *)(TLD_PAGE_MASK & (intptr_t)d);
193+
data = d;
194+
data->start = (~TLD_PAGE_MASK & (intptr_t)d) + offsetof(struct u_tld_data, data);
195+
} else {
196+
map_val.data = (void *)(TLD_ROUND_UP((intptr_t)d, TLD_PAGE_SIZE));
197+
data = (void *)(TLD_ROUND_UP((intptr_t)d, TLD_PAGE_SIZE));
198+
data->start = offsetof(struct u_tld_data, data);
199+
}
200+
map_val.metadata = TLD_READ_ONCE(tld_metadata_p);
201+
202+
err = bpf_map_update_elem(map_fd, &tid_fd, &map_val, 0);
203+
if (err) {
204+
free(d);
205+
goto out;
206+
}
207+
208+
tld_data_p = (struct u_tld_data *)data;
209+
tld_data_alloc_p = d;
210+
#ifdef TLD_FREE_DATA_ON_THREAD_EXIT
211+
pthread_setspecific(tld_pthread_key, (void *)1);
212+
#endif
213+
out:
214+
if (tid_fd >= 0)
215+
close(tid_fd);
216+
return err;
217+
}
218+
219+
static tld_key_t __tld_create_key(const char *name, size_t size, bool dyn_data)
220+
{
221+
int err, i, sz, off = 0;
222+
__u8 cnt;
223+
224+
if (!TLD_READ_ONCE(tld_metadata_p)) {
225+
err = __tld_init_metadata();
226+
if (err)
227+
return (tld_key_t){err};
228+
}
229+
230+
for (i = 0; i < TLD_MAX_DATA_CNT; i++) {
231+
retry:
232+
cnt = atomic_load(&tld_metadata_p->cnt);
233+
if (i < cnt) {
234+
/* A metadata is not ready until size is updated with a non-zero value */
235+
while (!(sz = atomic_load(&tld_metadata_p->metadata[i].size)))
236+
sched_yield();
237+
238+
if (!strncmp(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN))
239+
return (tld_key_t){-EEXIST};
240+
241+
off += TLD_ROUND_UP(sz, 8);
242+
continue;
243+
}
244+
245+
/*
246+
* TLD_DEFINE_KEY() is given memory upto a page while at most
247+
* TLD_DYN_DATA_SIZE is allocated for tld_create_key()
248+
*/
249+
if (dyn_data) {
250+
if (off + TLD_ROUND_UP(size, 8) > tld_metadata_p->size)
251+
return (tld_key_t){-E2BIG};
252+
} else {
253+
if (off + TLD_ROUND_UP(size, 8) > TLD_PAGE_SIZE - sizeof(struct u_tld_data))
254+
return (tld_key_t){-E2BIG};
255+
tld_metadata_p->size += TLD_ROUND_UP(size, 8);
256+
}
257+
258+
/*
259+
* Only one tld_create_key() can increase the current cnt by one and
260+
* takes the latest available slot. Other threads will check again if a new
261+
* TLD can still be added, and then compete for the new slot after the
262+
* succeeding thread update the size.
263+
*/
264+
if (!atomic_compare_exchange_strong(&tld_metadata_p->cnt, &cnt, cnt + 1))
265+
goto retry;
266+
267+
strncpy(tld_metadata_p->metadata[i].name, name, TLD_NAME_LEN);
268+
atomic_store(&tld_metadata_p->metadata[i].size, size);
269+
return (tld_key_t){(__s16)off};
270+
}
271+
272+
return (tld_key_t){-ENOSPC};
273+
}
274+
275+
/**
276+
* TLD_DEFINE_KEY() - Define a TLD and a global variable key associated with the TLD.
277+
*
278+
* @name: The name of the TLD
279+
* @size: The size of the TLD
280+
* @key: The variable name of the key. Cannot exceed TLD_NAME_LEN
281+
*
282+
* The macro can only be used in file scope.
283+
*
284+
* A global variable key of opaque type, tld_key_t, will be declared and initialized before
285+
* main() starts. Use tld_key_is_err() or tld_key_err_or_zero() later to check if the key
286+
* creation succeeded. Pass the key to tld_get_data() to get a pointer to the TLD.
287+
* bpf programs can also fetch the same key by name.
288+
*
289+
* The total size of TLDs created using TLD_DEFINE_KEY() cannot exceed a page. Just
290+
* enough memory will be allocated for each thread on the first call to tld_get_data().
291+
*/
292+
#define TLD_DEFINE_KEY(key, name, size) \
293+
tld_key_t key; \
294+
\
295+
__attribute__((constructor)) \
296+
void __tld_define_key_##key(void) \
297+
{ \
298+
key = __tld_create_key(name, size, false); \
299+
}
300+
301+
/**
302+
* tld_create_key() - Create a TLD and return a key associated with the TLD.
303+
*
304+
* @name: The name the TLD
305+
* @size: The size of the TLD
306+
*
307+
* Return an opaque object key. Use tld_key_is_err() or tld_key_err_or_zero() to check
308+
* if the key creation succeeded. Pass the key to tld_get_data() to get a pointer to
309+
* locate the TLD. bpf programs can also fetch the same key by name.
310+
*
311+
* Use tld_create_key() only when a TLD needs to be created dynamically (e.g., @name is
312+
* not known statically or a TLD needs to be created conditionally)
313+
*
314+
* An additional TLD_DYN_DATA_SIZE bytes are allocated per-thread to accommodate TLDs
315+
* created dynamically with tld_create_key(). Since only a user page is pinned to the
316+
* kernel, when TLDs created with TLD_DEFINE_KEY() uses more than TLD_PAGE_SIZE -
317+
* TLD_DYN_DATA_SIZE, the buffer size will be limited to the rest of the page.
318+
*/
319+
__attribute__((unused))
320+
static tld_key_t tld_create_key(const char *name, size_t size)
321+
{
322+
return __tld_create_key(name, size, true);
323+
}
324+
325+
__attribute__((unused))
326+
static inline bool tld_key_is_err(tld_key_t key)
327+
{
328+
return key.off < 0;
329+
}
330+
331+
__attribute__((unused))
332+
static inline int tld_key_err_or_zero(tld_key_t key)
333+
{
334+
return tld_key_is_err(key) ? key.off : 0;
335+
}
336+
337+
/**
338+
* tld_get_data() - Get a pointer to the TLD associated with the given key of the
339+
* calling thread.
340+
*
341+
* @map_fd: A file descriptor of tld_data_map, the underlying BPF task local storage map
342+
* of task local data.
343+
* @key: A key object created by TLD_DEFINE_KEY() or tld_create_key().
344+
*
345+
* Return a pointer to the TLD if the key is valid; NULL if not enough memory for TLD
346+
* for this thread, or the key is invalid. The returned pointer is guaranteed to be 8-byte
347+
* aligned.
348+
*
349+
* Threads that call tld_get_data() must call tld_free() on exit to prevent
350+
* memory leak if TLD_FREE_DATA_ON_THREAD_EXIT is not defined.
351+
*/
352+
__attribute__((unused))
353+
static void *tld_get_data(int map_fd, tld_key_t key)
354+
{
355+
if (!TLD_READ_ONCE(tld_metadata_p))
356+
return NULL;
357+
358+
/* tld_data_p is allocated on the first invocation of tld_get_data() */
359+
if (!tld_data_p && __tld_init_data(map_fd))
360+
return NULL;
361+
362+
return tld_data_p->data + key.off;
363+
}
364+
365+
/**
366+
* tld_free() - Free task local data memory of the calling thread
367+
*
368+
* For the calling thread, all pointers to TLDs acquired before will become invalid.
369+
*
370+
* Users must call tld_free() on thread exit to prevent memory leak. Alternatively,
371+
* define TLD_FREE_DATA_ON_THREAD_EXIT and a thread exit handler will be registered
372+
* to free the memory automatically.
373+
*/
374+
__attribute__((unused))
375+
static void tld_free(void)
376+
{
377+
if (tld_data_alloc_p) {
378+
free(tld_data_alloc_p);
379+
tld_data_alloc_p = NULL;
380+
tld_data_p = NULL;
381+
}
382+
}
383+
384+
#ifdef __cplusplus
385+
} /* extern "C" */
386+
#endif
387+
388+
#endif /* __TASK_LOCAL_DATA_H */

0 commit comments

Comments
 (0)