Skip to content

Commit bdf56c7

Browse files
committed
Merge tag 'slab-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka: "This time it's mostly refactoring and improving APIs for slab users in the kernel, along with some debugging improvements. - kmem_cache_create() refactoring (Christian Brauner) Over the years have been growing new parameters to kmem_cache_create() where most of them are needed only for a small number of caches - most recently the rcu_freeptr_offset parameter. To avoid adding new parameters to kmem_cache_create() and adjusting all its callers, or creating new wrappers such as kmem_cache_create_rcu(), we can now pass extra parameters using the new struct kmem_cache_args. Not explicitly initialized fields default to values interpreted as unused. kmem_cache_create() is for now a wrapper that works both with the new form: kmem_cache_create(name, object_size, args, flags) and the legacy form: kmem_cache_create(name, object_size, align, flags, ctor) - kmem_cache_destroy() waits for kfree_rcu()'s in flight (Vlastimil Babka, Uladislau Rezki) Since SLOB removal, kfree() is allowed for freeing objects allocated by kmem_cache_create(). By extension kfree_rcu() as allowed as well, which can allow converting simple call_rcu() callbacks that only do kmem_cache_free(), as there was never a kmem_cache_free_rcu() variant. However, for caches that can be destroyed e.g. on module removal, the cache owners knew to issue rcu_barrier() first to wait for the pending call_rcu()'s, and this is not sufficient for pending kfree_rcu()'s due to its internal batching optimizations. Ulad has provided a new kvfree_rcu_barrier() and to make the usage less error-prone, kmem_cache_destroy() calls it. Additionally, destroying SLAB_TYPESAFE_BY_RCU caches now again issues rcu_barrier() synchronously instead of using an async work, because the past motivation for async work no longer applies. Users of custom call_rcu() callbacks should however keep calling rcu_barrier() before cache destruction. - Debugging use-after-free in SLAB_TYPESAFE_BY_RCU caches (Jann Horn) Currently, KASAN cannot catch UAFs in such caches as it is legal to access them within a grace period, and we only track the grace period when trying to free the underlying slab page. The new CONFIG_SLUB_RCU_DEBUG option changes the freeing of individual object to be RCU-delayed, after which KASAN can poison them. - Delayed memcg charging (Shakeel Butt) In some cases, the memcg is uknown at allocation time, such as receiving network packets in softirq context. With kmem_cache_charge() these may be now charged later when the user and its memcg is known. - Misc fixes and improvements (Pedro Falcato, Axel Rasmussen, Christoph Lameter, Yan Zhen, Peng Fan, Xavier)" * tag 'slab-for-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (34 commits) mm, slab: restore kerneldoc for kmem_cache_create() io_uring: port to struct kmem_cache_args slab: make __kmem_cache_create() static inline slab: make kmem_cache_create_usercopy() static inline slab: remove kmem_cache_create_rcu() file: port to struct kmem_cache_args slab: create kmem_cache_create() compatibility layer slab: port KMEM_CACHE_USERCOPY() to struct kmem_cache_args slab: port KMEM_CACHE() to struct kmem_cache_args slab: remove rcu_freeptr_offset from struct kmem_cache slab: pass struct kmem_cache_args to do_kmem_cache_create() slab: pull kmem_cache_open() into do_kmem_cache_create() slab: pass struct kmem_cache_args to create_cache() slab: port kmem_cache_create_usercopy() to struct kmem_cache_args slab: port kmem_cache_create_rcu() to struct kmem_cache_args slab: port kmem_cache_create() to struct kmem_cache_args slab: add struct kmem_cache_args slab: s/__kmem_cache_create/do_kmem_cache_create/g memcg: add charging of already allocated slab objects mm/slab: Optimize the code logic in find_mergeable() ...
2 parents efdfcd4 + ecc4d6a commit bdf56c7

File tree

15 files changed

+934
-452
lines changed

15 files changed

+934
-452
lines changed

fs/file_table.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -521,9 +521,14 @@ EXPORT_SYMBOL(__fput_sync);
521521

522522
void __init files_init(void)
523523
{
524-
filp_cachep = kmem_cache_create_rcu("filp", sizeof(struct file),
525-
offsetof(struct file, f_freeptr),
526-
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
524+
struct kmem_cache_args args = {
525+
.use_freeptr_offset = true,
526+
.freeptr_offset = offsetof(struct file, f_freeptr),
527+
};
528+
529+
filp_cachep = kmem_cache_create("filp", sizeof(struct file), &args,
530+
SLAB_HWCACHE_ALIGN | SLAB_PANIC |
531+
SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU);
527532
percpu_counter_init(&nr_files, 0, GFP_KERNEL);
528533
}
529534

include/linux/kasan.h

Lines changed: 58 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,13 +175,59 @@ static __always_inline void * __must_check kasan_init_slab_obj(
175175
return (void *)object;
176176
}
177177

178-
bool __kasan_slab_free(struct kmem_cache *s, void *object,
179-
unsigned long ip, bool init);
178+
bool __kasan_slab_pre_free(struct kmem_cache *s, void *object,
179+
unsigned long ip);
180+
/**
181+
* kasan_slab_pre_free - Check whether freeing a slab object is safe.
182+
* @object: Object to be freed.
183+
*
184+
* This function checks whether freeing the given object is safe. It may
185+
* check for double-free and invalid-free bugs and report them.
186+
*
187+
* This function is intended only for use by the slab allocator.
188+
*
189+
* @Return true if freeing the object is unsafe; false otherwise.
190+
*/
191+
static __always_inline bool kasan_slab_pre_free(struct kmem_cache *s,
192+
void *object)
193+
{
194+
if (kasan_enabled())
195+
return __kasan_slab_pre_free(s, object, _RET_IP_);
196+
return false;
197+
}
198+
199+
bool __kasan_slab_free(struct kmem_cache *s, void *object, bool init,
200+
bool still_accessible);
201+
/**
202+
* kasan_slab_free - Poison, initialize, and quarantine a slab object.
203+
* @object: Object to be freed.
204+
* @init: Whether to initialize the object.
205+
* @still_accessible: Whether the object contents are still accessible.
206+
*
207+
* This function informs that a slab object has been freed and is not
208+
* supposed to be accessed anymore, except when @still_accessible is set
209+
* (indicating that the object is in a SLAB_TYPESAFE_BY_RCU cache and an RCU
210+
* grace period might not have passed yet).
211+
*
212+
* For KASAN modes that have integrated memory initialization
213+
* (kasan_has_integrated_init() == true), this function also initializes
214+
* the object's memory. For other modes, the @init argument is ignored.
215+
*
216+
* This function might also take ownership of the object to quarantine it.
217+
* When this happens, KASAN will defer freeing the object to a later
218+
* stage and handle it internally until then. The return value indicates
219+
* whether KASAN took ownership of the object.
220+
*
221+
* This function is intended only for use by the slab allocator.
222+
*
223+
* @Return true if KASAN took ownership of the object; false otherwise.
224+
*/
180225
static __always_inline bool kasan_slab_free(struct kmem_cache *s,
181-
void *object, bool init)
226+
void *object, bool init,
227+
bool still_accessible)
182228
{
183229
if (kasan_enabled())
184-
return __kasan_slab_free(s, object, _RET_IP_, init);
230+
return __kasan_slab_free(s, object, init, still_accessible);
185231
return false;
186232
}
187233

@@ -371,7 +417,14 @@ static inline void *kasan_init_slab_obj(struct kmem_cache *cache,
371417
{
372418
return (void *)object;
373419
}
374-
static inline bool kasan_slab_free(struct kmem_cache *s, void *object, bool init)
420+
421+
static inline bool kasan_slab_pre_free(struct kmem_cache *s, void *object)
422+
{
423+
return false;
424+
}
425+
426+
static inline bool kasan_slab_free(struct kmem_cache *s, void *object,
427+
bool init, bool still_accessible)
375428
{
376429
return false;
377430
}

include/linux/rcutiny.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,11 @@ static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr)
111111
kvfree(ptr);
112112
}
113113

114+
static inline void kvfree_rcu_barrier(void)
115+
{
116+
rcu_barrier();
117+
}
118+
114119
#ifdef CONFIG_KASAN_GENERIC
115120
void kvfree_call_rcu(struct rcu_head *head, void *ptr);
116121
#else

include/linux/rcutree.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ static inline void rcu_virt_note_context_switch(void)
3535

3636
void synchronize_rcu_expedited(void);
3737
void kvfree_call_rcu(struct rcu_head *head, void *ptr);
38+
void kvfree_rcu_barrier(void);
3839

3940
void rcu_barrier(void);
4041
void rcu_momentary_eqs(void);

include/linux/slab.h

Lines changed: 208 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -240,17 +240,173 @@ struct mem_cgroup;
240240
*/
241241
bool slab_is_available(void);
242242

243-
struct kmem_cache *kmem_cache_create(const char *name, unsigned int size,
244-
unsigned int align, slab_flags_t flags,
245-
void (*ctor)(void *));
246-
struct kmem_cache *kmem_cache_create_usercopy(const char *name,
247-
unsigned int size, unsigned int align,
248-
slab_flags_t flags,
249-
unsigned int useroffset, unsigned int usersize,
250-
void (*ctor)(void *));
251-
struct kmem_cache *kmem_cache_create_rcu(const char *name, unsigned int size,
252-
unsigned int freeptr_offset,
253-
slab_flags_t flags);
243+
/**
244+
* struct kmem_cache_args - Less common arguments for kmem_cache_create()
245+
*
246+
* Any uninitialized fields of the structure are interpreted as unused. The
247+
* exception is @freeptr_offset where %0 is a valid value, so
248+
* @use_freeptr_offset must be also set to %true in order to interpret the field
249+
* as used. For @useroffset %0 is also valid, but only with non-%0
250+
* @usersize.
251+
*
252+
* When %NULL args is passed to kmem_cache_create(), it is equivalent to all
253+
* fields unused.
254+
*/
255+
struct kmem_cache_args {
256+
/**
257+
* @align: The required alignment for the objects.
258+
*
259+
* %0 means no specific alignment is requested.
260+
*/
261+
unsigned int align;
262+
/**
263+
* @useroffset: Usercopy region offset.
264+
*
265+
* %0 is a valid offset, when @usersize is non-%0
266+
*/
267+
unsigned int useroffset;
268+
/**
269+
* @usersize: Usercopy region size.
270+
*
271+
* %0 means no usercopy region is specified.
272+
*/
273+
unsigned int usersize;
274+
/**
275+
* @freeptr_offset: Custom offset for the free pointer
276+
* in &SLAB_TYPESAFE_BY_RCU caches
277+
*
278+
* By default &SLAB_TYPESAFE_BY_RCU caches place the free pointer
279+
* outside of the object. This might cause the object to grow in size.
280+
* Cache creators that have a reason to avoid this can specify a custom
281+
* free pointer offset in their struct where the free pointer will be
282+
* placed.
283+
*
284+
* Note that placing the free pointer inside the object requires the
285+
* caller to ensure that no fields are invalidated that are required to
286+
* guard against object recycling (See &SLAB_TYPESAFE_BY_RCU for
287+
* details).
288+
*
289+
* Using %0 as a value for @freeptr_offset is valid. If @freeptr_offset
290+
* is specified, %use_freeptr_offset must be set %true.
291+
*
292+
* Note that @ctor currently isn't supported with custom free pointers
293+
* as a @ctor requires an external free pointer.
294+
*/
295+
unsigned int freeptr_offset;
296+
/**
297+
* @use_freeptr_offset: Whether a @freeptr_offset is used.
298+
*/
299+
bool use_freeptr_offset;
300+
/**
301+
* @ctor: A constructor for the objects.
302+
*
303+
* The constructor is invoked for each object in a newly allocated slab
304+
* page. It is the cache user's responsibility to free object in the
305+
* same state as after calling the constructor, or deal appropriately
306+
* with any differences between a freshly constructed and a reallocated
307+
* object.
308+
*
309+
* %NULL means no constructor.
310+
*/
311+
void (*ctor)(void *);
312+
};
313+
314+
struct kmem_cache *__kmem_cache_create_args(const char *name,
315+
unsigned int object_size,
316+
struct kmem_cache_args *args,
317+
slab_flags_t flags);
318+
static inline struct kmem_cache *
319+
__kmem_cache_create(const char *name, unsigned int size, unsigned int align,
320+
slab_flags_t flags, void (*ctor)(void *))
321+
{
322+
struct kmem_cache_args kmem_args = {
323+
.align = align,
324+
.ctor = ctor,
325+
};
326+
327+
return __kmem_cache_create_args(name, size, &kmem_args, flags);
328+
}
329+
330+
/**
331+
* kmem_cache_create_usercopy - Create a kmem cache with a region suitable
332+
* for copying to userspace.
333+
* @name: A string which is used in /proc/slabinfo to identify this cache.
334+
* @size: The size of objects to be created in this cache.
335+
* @align: The required alignment for the objects.
336+
* @flags: SLAB flags
337+
* @useroffset: Usercopy region offset
338+
* @usersize: Usercopy region size
339+
* @ctor: A constructor for the objects, or %NULL.
340+
*
341+
* This is a legacy wrapper, new code should use either KMEM_CACHE_USERCOPY()
342+
* if whitelisting a single field is sufficient, or kmem_cache_create() with
343+
* the necessary parameters passed via the args parameter (see
344+
* &struct kmem_cache_args)
345+
*
346+
* Return: a pointer to the cache on success, NULL on failure.
347+
*/
348+
static inline struct kmem_cache *
349+
kmem_cache_create_usercopy(const char *name, unsigned int size,
350+
unsigned int align, slab_flags_t flags,
351+
unsigned int useroffset, unsigned int usersize,
352+
void (*ctor)(void *))
353+
{
354+
struct kmem_cache_args kmem_args = {
355+
.align = align,
356+
.ctor = ctor,
357+
.useroffset = useroffset,
358+
.usersize = usersize,
359+
};
360+
361+
return __kmem_cache_create_args(name, size, &kmem_args, flags);
362+
}
363+
364+
/* If NULL is passed for @args, use this variant with default arguments. */
365+
static inline struct kmem_cache *
366+
__kmem_cache_default_args(const char *name, unsigned int size,
367+
struct kmem_cache_args *args,
368+
slab_flags_t flags)
369+
{
370+
struct kmem_cache_args kmem_default_args = {};
371+
372+
/* Make sure we don't get passed garbage. */
373+
if (WARN_ON_ONCE(args))
374+
return ERR_PTR(-EINVAL);
375+
376+
return __kmem_cache_create_args(name, size, &kmem_default_args, flags);
377+
}
378+
379+
/**
380+
* kmem_cache_create - Create a kmem cache.
381+
* @__name: A string which is used in /proc/slabinfo to identify this cache.
382+
* @__object_size: The size of objects to be created in this cache.
383+
* @__args: Optional arguments, see &struct kmem_cache_args. Passing %NULL
384+
* means defaults will be used for all the arguments.
385+
*
386+
* This is currently implemented as a macro using ``_Generic()`` to call
387+
* either the new variant of the function, or a legacy one.
388+
*
389+
* The new variant has 4 parameters:
390+
* ``kmem_cache_create(name, object_size, args, flags)``
391+
*
392+
* See __kmem_cache_create_args() which implements this.
393+
*
394+
* The legacy variant has 5 parameters:
395+
* ``kmem_cache_create(name, object_size, align, flags, ctor)``
396+
*
397+
* The align and ctor parameters map to the respective fields of
398+
* &struct kmem_cache_args
399+
*
400+
* Context: Cannot be called within a interrupt, but can be interrupted.
401+
*
402+
* Return: a pointer to the cache on success, NULL on failure.
403+
*/
404+
#define kmem_cache_create(__name, __object_size, __args, ...) \
405+
_Generic((__args), \
406+
struct kmem_cache_args *: __kmem_cache_create_args, \
407+
void *: __kmem_cache_default_args, \
408+
default: __kmem_cache_create)(__name, __object_size, __args, __VA_ARGS__)
409+
254410
void kmem_cache_destroy(struct kmem_cache *s);
255411
int kmem_cache_shrink(struct kmem_cache *s);
256412

@@ -262,20 +418,23 @@ int kmem_cache_shrink(struct kmem_cache *s);
262418
* f.e. add ____cacheline_aligned_in_smp to the struct declaration
263419
* then the objects will be properly aligned in SMP configurations.
264420
*/
265-
#define KMEM_CACHE(__struct, __flags) \
266-
kmem_cache_create(#__struct, sizeof(struct __struct), \
267-
__alignof__(struct __struct), (__flags), NULL)
421+
#define KMEM_CACHE(__struct, __flags) \
422+
__kmem_cache_create_args(#__struct, sizeof(struct __struct), \
423+
&(struct kmem_cache_args) { \
424+
.align = __alignof__(struct __struct), \
425+
}, (__flags))
268426

269427
/*
270428
* To whitelist a single field for copying to/from usercopy, use this
271429
* macro instead for KMEM_CACHE() above.
272430
*/
273-
#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) \
274-
kmem_cache_create_usercopy(#__struct, \
275-
sizeof(struct __struct), \
276-
__alignof__(struct __struct), (__flags), \
277-
offsetof(struct __struct, __field), \
278-
sizeof_field(struct __struct, __field), NULL)
431+
#define KMEM_CACHE_USERCOPY(__struct, __flags, __field) \
432+
__kmem_cache_create_args(#__struct, sizeof(struct __struct), \
433+
&(struct kmem_cache_args) { \
434+
.align = __alignof__(struct __struct), \
435+
.useroffset = offsetof(struct __struct, __field), \
436+
.usersize = sizeof_field(struct __struct, __field), \
437+
}, (__flags))
279438

280439
/*
281440
* Common kmalloc functions provided by all allocators
@@ -556,6 +715,35 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
556715
gfp_t gfpflags) __assume_slab_alignment __malloc;
557716
#define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
558717

718+
/**
719+
* kmem_cache_charge - memcg charge an already allocated slab memory
720+
* @objp: address of the slab object to memcg charge
721+
* @gfpflags: describe the allocation context
722+
*
723+
* kmem_cache_charge allows charging a slab object to the current memcg,
724+
* primarily in cases where charging at allocation time might not be possible
725+
* because the target memcg is not known (i.e. softirq context)
726+
*
727+
* The objp should be pointer returned by the slab allocator functions like
728+
* kmalloc (with __GFP_ACCOUNT in flags) or kmem_cache_alloc. The memcg charge
729+
* behavior can be controlled through gfpflags parameter, which affects how the
730+
* necessary internal metadata can be allocated. Including __GFP_NOFAIL denotes
731+
* that overcharging is requested instead of failure, but is not applied for the
732+
* internal metadata allocation.
733+
*
734+
* There are several cases where it will return true even if the charging was
735+
* not done:
736+
* More specifically:
737+
*
738+
* 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
739+
* 2. Already charged slab objects.
740+
* 3. For slab objects from KMALLOC_NORMAL caches - allocated by kmalloc()
741+
* without __GFP_ACCOUNT
742+
* 4. Allocating internal metadata has failed
743+
*
744+
* Return: true if charge was successful otherwise false.
745+
*/
746+
bool kmem_cache_charge(void *objp, gfp_t gfpflags);
559747
void kmem_cache_free(struct kmem_cache *s, void *objp);
560748

561749
kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,

io_uring/io_uring.c

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3755,6 +3755,11 @@ SYSCALL_DEFINE2(io_uring_setup, u32, entries,
37553755

37563756
static int __init io_uring_init(void)
37573757
{
3758+
struct kmem_cache_args kmem_args = {
3759+
.useroffset = offsetof(struct io_kiocb, cmd.data),
3760+
.usersize = sizeof_field(struct io_kiocb, cmd.data),
3761+
};
3762+
37583763
#define __BUILD_BUG_VERIFY_OFFSET_SIZE(stype, eoffset, esize, ename) do { \
37593764
BUILD_BUG_ON(offsetof(stype, ename) != eoffset); \
37603765
BUILD_BUG_ON(sizeof_field(stype, ename) != esize); \
@@ -3839,12 +3844,9 @@ static int __init io_uring_init(void)
38393844
* range, and HARDENED_USERCOPY will complain if we haven't
38403845
* correctly annotated this range.
38413846
*/
3842-
req_cachep = kmem_cache_create_usercopy("io_kiocb",
3843-
sizeof(struct io_kiocb), 0,
3844-
SLAB_HWCACHE_ALIGN | SLAB_PANIC |
3845-
SLAB_ACCOUNT | SLAB_TYPESAFE_BY_RCU,
3846-
offsetof(struct io_kiocb, cmd.data),
3847-
sizeof_field(struct io_kiocb, cmd.data), NULL);
3847+
req_cachep = kmem_cache_create("io_kiocb", sizeof(struct io_kiocb), &kmem_args,
3848+
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT |
3849+
SLAB_TYPESAFE_BY_RCU);
38483850
io_buf_cachep = KMEM_CACHE(io_buffer,
38493851
SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
38503852

0 commit comments

Comments
 (0)