Skip to content

Commit 76d9b92

Browse files
committed
Merge tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka: "The most prominent change this time is the kmem_buckets based hardening of kmalloc() allocations from Kees Cook. We have also extended the kmalloc() alignment guarantees for non-power-of-two sizes in a way that benefits rust. The rest are various cleanups and non-critical fixups. - Dedicated bucket allocator (Kees Cook) This series [1] enhances the probabilistic defense against heap spraying/grooming of CONFIG_RANDOM_KMALLOC_CACHES from last year. kmalloc() users that are known to be useful for exploits can get completely separate set of kmalloc caches that can't be shared with other users. The first converted users are alloc_msg() and memdup_user(). The hardening is enabled by CONFIG_SLAB_BUCKETS. - Extended kmalloc() alignment guarantees (Vlastimil Babka) For years now we have guaranteed natural alignment for power-of-two allocations, but nothing was defined for other sizes (in practice, we have two such buckets, kmalloc-96 and kmalloc-192). To avoid unnecessary padding in the rust layer due to its alignment rules, extend the guarantee so that the alignment is at least the largest power-of-two divisor of the requested size. This fits what rust needs, is a superset of the existing power-of-two guarantee, and does not in practice change the layout (and thus does not add overhead due to padding) of the kmalloc-96 and kmalloc-192 caches, unless slab debugging is enabled for them. - Cleanups and non-critical fixups (Chengming Zhou, Suren Baghdasaryan, Matthew Willcox, Alex Shi, and Vlastimil Babka) Various tweaks related to the new alloc profiling code, folio conversion, debugging and more leftovers after SLAB" Link: https://lore.kernel.org/all/[email protected]/ [1] * tag 'slab-for-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/memcg: alignment memcg_data define condition mm, slab: move prepare_slab_obj_exts_hook under CONFIG_MEM_ALLOC_PROFILING mm, slab: move allocation tagging code in the alloc path into a hook mm/util: Use dedicated slab buckets for memdup_user() ipc, msg: Use dedicated slab buckets for alloc_msg() mm/slab: Introduce kmem_buckets_create() and family mm/slab: Introduce kvmalloc_buckets_node() that can take kmem_buckets argument mm/slab: Plumb kmem_buckets into __do_kmalloc_node() mm/slab: Introduce kmem_buckets typedef slab, rust: extend kmalloc() alignment guarantees to remove Rust padding slab: delete useless RED_INACTIVE and RED_ACTIVE slab: don't put freepointer outside of object if only orig_size slab: make check_object() more consistent mm: Reduce the number of slab->folio casts mm, slab: don't wrap internal functions with alloc_hooks()
2 parents b2fc97c + 436381e commit 76d9b92

File tree

17 files changed

+369
-175
lines changed

17 files changed

+369
-175
lines changed

Documentation/core-api/memory-allocation.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -144,8 +144,10 @@ configuration, but it is a good practice to use `kmalloc` for objects
144144
smaller than page size.
145145

146146
The address of a chunk allocated with `kmalloc` is aligned to at least
147-
ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
148-
alignment is also guaranteed to be at least the respective size.
147+
ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the
148+
alignment is also guaranteed to be at least the respective size. For other
149+
sizes, the alignment is guaranteed to be at least the largest power-of-two
150+
divisor of the size.
149151

150152
Chunks allocated with kmalloc() can be resized with krealloc(). Similarly
151153
to kmalloc_array(): a helper for resizing arrays is provided in the form of

include/linux/mm.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1110,7 +1110,7 @@ static inline unsigned int compound_order(struct page *page)
11101110
*
11111111
* Return: The order of the folio.
11121112
*/
1113-
static inline unsigned int folio_order(struct folio *folio)
1113+
static inline unsigned int folio_order(const struct folio *folio)
11141114
{
11151115
if (!folio_test_large(folio))
11161116
return 0;
@@ -2150,7 +2150,7 @@ static inline struct folio *folio_next(struct folio *folio)
21502150
* it from being split. It is not necessary for the folio to be locked.
21512151
* Return: The base-2 logarithm of the size of this folio.
21522152
*/
2153-
static inline unsigned int folio_shift(struct folio *folio)
2153+
static inline unsigned int folio_shift(const struct folio *folio)
21542154
{
21552155
return PAGE_SHIFT + folio_order(folio);
21562156
}
@@ -2163,7 +2163,7 @@ static inline unsigned int folio_shift(struct folio *folio)
21632163
* it from being split. It is not necessary for the folio to be locked.
21642164
* Return: The number of bytes in this folio.
21652165
*/
2166-
static inline size_t folio_size(struct folio *folio)
2166+
static inline size_t folio_size(const struct folio *folio)
21672167
{
21682168
return PAGE_SIZE << folio_order(folio);
21692169
}

include/linux/mm_types.h

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -169,8 +169,10 @@ struct page {
169169
/* Usage count. *DO NOT USE DIRECTLY*. See page_ref.h */
170170
atomic_t _refcount;
171171

172-
#ifdef CONFIG_SLAB_OBJ_EXT
172+
#ifdef CONFIG_MEMCG
173173
unsigned long memcg_data;
174+
#elif defined(CONFIG_SLAB_OBJ_EXT)
175+
unsigned long _unused_slab_obj_exts;
174176
#endif
175177

176178
/*
@@ -298,6 +300,7 @@ typedef struct {
298300
* @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgroup.h.
299301
* @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head().
300302
* @_deferred_list: Folios to be split under memory pressure.
303+
* @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab.
301304
*
302305
* A folio is a physically, virtually and logically contiguous set
303306
* of bytes. It is a power-of-two in size, and it is aligned to that
@@ -332,8 +335,10 @@ struct folio {
332335
};
333336
atomic_t _mapcount;
334337
atomic_t _refcount;
335-
#ifdef CONFIG_SLAB_OBJ_EXT
338+
#ifdef CONFIG_MEMCG
336339
unsigned long memcg_data;
340+
#elif defined(CONFIG_SLAB_OBJ_EXT)
341+
unsigned long _unused_slab_obj_exts;
337342
#endif
338343
#if defined(WANT_PAGE_VIRTUAL)
339344
void *virtual;

include/linux/poison.h

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,8 @@
3838
* Magic nums for obj red zoning.
3939
* Placed in the first word before and the first word after an obj.
4040
*/
41-
#define RED_INACTIVE 0x09F911029D74E35BULL /* when obj is inactive */
42-
#define RED_ACTIVE 0xD84156C5635688C0ULL /* when obj is active */
43-
44-
#define SLUB_RED_INACTIVE 0xbb
45-
#define SLUB_RED_ACTIVE 0xcc
41+
#define SLUB_RED_INACTIVE 0xbb /* when obj is inactive */
42+
#define SLUB_RED_ACTIVE 0xcc /* when obj is active */
4643

4744
/* ...and for poisoning */
4845
#define POISON_INUSE 0x5a /* for use-uninitialised poisoning */

include/linux/slab.h

Lines changed: 65 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -426,8 +426,9 @@ enum kmalloc_cache_type {
426426
NR_KMALLOC_TYPES
427427
};
428428

429-
extern struct kmem_cache *
430-
kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
429+
typedef struct kmem_cache * kmem_buckets[KMALLOC_SHIFT_HIGH + 1];
430+
431+
extern kmem_buckets kmalloc_caches[NR_KMALLOC_TYPES];
431432

432433
/*
433434
* Define gfp bits that should not be set for KMALLOC_NORMAL.
@@ -528,9 +529,6 @@ static_assert(PAGE_SHIFT <= 20);
528529

529530
#include <linux/alloc_tag.h>
530531

531-
void *__kmalloc_noprof(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1);
532-
#define __kmalloc(...) alloc_hooks(__kmalloc_noprof(__VA_ARGS__))
533-
534532
/**
535533
* kmem_cache_alloc - Allocate an object
536534
* @cachep: The cache to allocate from.
@@ -551,6 +549,10 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
551549

552550
void kmem_cache_free(struct kmem_cache *s, void *objp);
553551

552+
kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
553+
unsigned int useroffset, unsigned int usersize,
554+
void (*ctor)(void *));
555+
554556
/*
555557
* Bulk allocation and freeing operations. These are accelerated in an
556558
* allocator specific way to avoid taking locks repeatedly or building
@@ -568,31 +570,49 @@ static __always_inline void kfree_bulk(size_t size, void **p)
568570
kmem_cache_free_bulk(NULL, size, p);
569571
}
570572

571-
void *__kmalloc_node_noprof(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment
572-
__alloc_size(1);
573-
#define __kmalloc_node(...) alloc_hooks(__kmalloc_node_noprof(__VA_ARGS__))
574-
575573
void *kmem_cache_alloc_node_noprof(struct kmem_cache *s, gfp_t flags,
576574
int node) __assume_slab_alignment __malloc;
577575
#define kmem_cache_alloc_node(...) alloc_hooks(kmem_cache_alloc_node_noprof(__VA_ARGS__))
578576

579-
void *kmalloc_trace_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
580-
__assume_kmalloc_alignment __alloc_size(3);
577+
/*
578+
* These macros allow declaring a kmem_buckets * parameter alongside size, which
579+
* can be compiled out with CONFIG_SLAB_BUCKETS=n so that a large number of call
580+
* sites don't have to pass NULL.
581+
*/
582+
#ifdef CONFIG_SLAB_BUCKETS
583+
#define DECL_BUCKET_PARAMS(_size, _b) size_t (_size), kmem_buckets *(_b)
584+
#define PASS_BUCKET_PARAMS(_size, _b) (_size), (_b)
585+
#define PASS_BUCKET_PARAM(_b) (_b)
586+
#else
587+
#define DECL_BUCKET_PARAMS(_size, _b) size_t (_size)
588+
#define PASS_BUCKET_PARAMS(_size, _b) (_size)
589+
#define PASS_BUCKET_PARAM(_b) NULL
590+
#endif
591+
592+
/*
593+
* The following functions are not to be used directly and are intended only
594+
* for internal use from kmalloc() and kmalloc_node()
595+
* with the exception of kunit tests
596+
*/
597+
598+
void *__kmalloc_noprof(size_t size, gfp_t flags)
599+
__assume_kmalloc_alignment __alloc_size(1);
581600

582-
void *kmalloc_node_trace_noprof(struct kmem_cache *s, gfp_t gfpflags,
583-
int node, size_t size) __assume_kmalloc_alignment
584-
__alloc_size(4);
585-
#define kmalloc_trace(...) alloc_hooks(kmalloc_trace_noprof(__VA_ARGS__))
601+
void *__kmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node)
602+
__assume_kmalloc_alignment __alloc_size(1);
586603

587-
#define kmalloc_node_trace(...) alloc_hooks(kmalloc_node_trace_noprof(__VA_ARGS__))
604+
void *__kmalloc_cache_noprof(struct kmem_cache *s, gfp_t flags, size_t size)
605+
__assume_kmalloc_alignment __alloc_size(3);
588606

589-
void *kmalloc_large_noprof(size_t size, gfp_t flags) __assume_page_alignment
590-
__alloc_size(1);
591-
#define kmalloc_large(...) alloc_hooks(kmalloc_large_noprof(__VA_ARGS__))
607+
void *__kmalloc_cache_node_noprof(struct kmem_cache *s, gfp_t gfpflags,
608+
int node, size_t size)
609+
__assume_kmalloc_alignment __alloc_size(4);
592610

593-
void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_page_alignment
594-
__alloc_size(1);
595-
#define kmalloc_large_node(...) alloc_hooks(kmalloc_large_node_noprof(__VA_ARGS__))
611+
void *__kmalloc_large_noprof(size_t size, gfp_t flags)
612+
__assume_page_alignment __alloc_size(1);
613+
614+
void *__kmalloc_large_node_noprof(size_t size, gfp_t flags, int node)
615+
__assume_page_alignment __alloc_size(1);
596616

597617
/**
598618
* kmalloc - allocate kernel memory
@@ -604,7 +624,8 @@ void *kmalloc_large_node_noprof(size_t size, gfp_t flags, int node) __assume_pag
604624
*
605625
* The allocated object address is aligned to at least ARCH_KMALLOC_MINALIGN
606626
* bytes. For @size of power of two bytes, the alignment is also guaranteed
607-
* to be at least to the size.
627+
* to be at least to the size. For other sizes, the alignment is guaranteed to
628+
* be at least the largest power-of-two divisor of @size.
608629
*
609630
* The @flags argument may be one of the GFP flags defined at
610631
* include/linux/gfp_types.h and described at
@@ -654,31 +675,37 @@ static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size, gfp_t f
654675
unsigned int index;
655676

656677
if (size > KMALLOC_MAX_CACHE_SIZE)
657-
return kmalloc_large_noprof(size, flags);
678+
return __kmalloc_large_noprof(size, flags);
658679

659680
index = kmalloc_index(size);
660-
return kmalloc_trace_noprof(
681+
return __kmalloc_cache_noprof(
661682
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
662683
flags, size);
663684
}
664685
return __kmalloc_noprof(size, flags);
665686
}
666687
#define kmalloc(...) alloc_hooks(kmalloc_noprof(__VA_ARGS__))
667688

689+
#define kmem_buckets_alloc(_b, _size, _flags) \
690+
alloc_hooks(__kmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
691+
692+
#define kmem_buckets_alloc_track_caller(_b, _size, _flags) \
693+
alloc_hooks(__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE, _RET_IP_))
694+
668695
static __always_inline __alloc_size(1) void *kmalloc_node_noprof(size_t size, gfp_t flags, int node)
669696
{
670697
if (__builtin_constant_p(size) && size) {
671698
unsigned int index;
672699

673700
if (size > KMALLOC_MAX_CACHE_SIZE)
674-
return kmalloc_large_node_noprof(size, flags, node);
701+
return __kmalloc_large_node_noprof(size, flags, node);
675702

676703
index = kmalloc_index(size);
677-
return kmalloc_node_trace_noprof(
704+
return __kmalloc_cache_node_noprof(
678705
kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
679706
flags, node, size);
680707
}
681-
return __kmalloc_node_noprof(size, flags, node);
708+
return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node);
682709
}
683710
#define kmalloc_node(...) alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))
684711

@@ -729,8 +756,10 @@ static inline __realloc_size(2, 3) void * __must_check krealloc_array_noprof(voi
729756
*/
730757
#define kcalloc(n, size, flags) kmalloc_array(n, size, (flags) | __GFP_ZERO)
731758

732-
void *kmalloc_node_track_caller_noprof(size_t size, gfp_t flags, int node,
733-
unsigned long caller) __alloc_size(1);
759+
void *__kmalloc_node_track_caller_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node,
760+
unsigned long caller) __alloc_size(1);
761+
#define kmalloc_node_track_caller_noprof(size, flags, node, caller) \
762+
__kmalloc_node_track_caller_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node, caller)
734763
#define kmalloc_node_track_caller(...) \
735764
alloc_hooks(kmalloc_node_track_caller_noprof(__VA_ARGS__, _RET_IP_))
736765

@@ -756,7 +785,7 @@ static inline __alloc_size(1, 2) void *kmalloc_array_node_noprof(size_t n, size_
756785
return NULL;
757786
if (__builtin_constant_p(n) && __builtin_constant_p(size))
758787
return kmalloc_node_noprof(bytes, flags, node);
759-
return __kmalloc_node_noprof(bytes, flags, node);
788+
return __kmalloc_node_noprof(PASS_BUCKET_PARAMS(bytes, NULL), flags, node);
760789
}
761790
#define kmalloc_array_node(...) alloc_hooks(kmalloc_array_node_noprof(__VA_ARGS__))
762791

@@ -780,14 +809,18 @@ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
780809
#define kzalloc(...) alloc_hooks(kzalloc_noprof(__VA_ARGS__))
781810
#define kzalloc_node(_size, _flags, _node) kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
782811

783-
extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
812+
void *__kvmalloc_node_noprof(DECL_BUCKET_PARAMS(size, b), gfp_t flags, int node) __alloc_size(1);
813+
#define kvmalloc_node_noprof(size, flags, node) \
814+
__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(size, NULL), flags, node)
784815
#define kvmalloc_node(...) alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
785816

786817
#define kvmalloc(_size, _flags) kvmalloc_node(_size, _flags, NUMA_NO_NODE)
787818
#define kvmalloc_noprof(_size, _flags) kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE)
788819
#define kvzalloc(_size, _flags) kvmalloc(_size, (_flags)|__GFP_ZERO)
789820

790821
#define kvzalloc_node(_size, _flags, _node) kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
822+
#define kmem_buckets_valloc(_b, _size, _flags) \
823+
alloc_hooks(__kvmalloc_node_noprof(PASS_BUCKET_PARAMS(_size, _b), _flags, NUMA_NO_NODE))
791824

792825
static inline __alloc_size(1, 2) void *
793826
kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)

ipc/msgutil.c

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,17 @@ struct msg_msgseg {
4242
#define DATALEN_MSG ((size_t)PAGE_SIZE-sizeof(struct msg_msg))
4343
#define DATALEN_SEG ((size_t)PAGE_SIZE-sizeof(struct msg_msgseg))
4444

45+
static kmem_buckets *msg_buckets __ro_after_init;
46+
47+
static int __init init_msg_buckets(void)
48+
{
49+
msg_buckets = kmem_buckets_create("msg_msg", SLAB_ACCOUNT,
50+
sizeof(struct msg_msg),
51+
DATALEN_MSG, NULL);
52+
53+
return 0;
54+
}
55+
subsys_initcall(init_msg_buckets);
4556

4657
static struct msg_msg *alloc_msg(size_t len)
4758
{
@@ -50,7 +61,7 @@ static struct msg_msg *alloc_msg(size_t len)
5061
size_t alen;
5162

5263
alen = min(len, DATALEN_MSG);
53-
msg = kmalloc(sizeof(*msg) + alen, GFP_KERNEL_ACCOUNT);
64+
msg = kmem_buckets_alloc(msg_buckets, sizeof(*msg) + alen, GFP_KERNEL);
5465
if (msg == NULL)
5566
return NULL;
5667

kernel/configs/hardening.config

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ CONFIG_RANDOMIZE_MEMORY=y
2020
# Randomize allocator freelists, harden metadata.
2121
CONFIG_SLAB_FREELIST_RANDOM=y
2222
CONFIG_SLAB_FREELIST_HARDENED=y
23+
CONFIG_SLAB_BUCKETS=y
2324
CONFIG_SHUFFLE_PAGE_ALLOCATOR=y
2425
CONFIG_RANDOM_KMALLOC_CACHES=y
2526

lib/fortify_kunit.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,8 +233,6 @@ static void fortify_test_alloc_size_##allocator##_dynamic(struct kunit *test) \
233233
kfree(p)); \
234234
checker(expected_size, \
235235
kmalloc_array_node(alloc_size, 1, gfp, NUMA_NO_NODE), \
236-
kfree(p)); \
237-
checker(expected_size, __kmalloc(alloc_size, gfp), \
238236
kfree(p)); \
239237
\
240238
orig = kmalloc(alloc_size, gfp); \

lib/slub_kunit.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ static void test_kmalloc_redzone_access(struct kunit *test)
140140
{
141141
struct kmem_cache *s = test_kmem_cache_create("TestSlub_RZ_kmalloc", 32,
142142
SLAB_KMALLOC|SLAB_STORE_USER|SLAB_RED_ZONE);
143-
u8 *p = kmalloc_trace(s, GFP_KERNEL, 18);
143+
u8 *p = __kmalloc_cache_noprof(s, GFP_KERNEL, 18);
144144

145145
kasan_disable_current();
146146

mm/Kconfig

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -273,6 +273,23 @@ config SLAB_FREELIST_HARDENED
273273
sacrifices to harden the kernel slab allocator against common
274274
freelist exploit methods.
275275

276+
config SLAB_BUCKETS
277+
bool "Support allocation from separate kmalloc buckets"
278+
depends on !SLUB_TINY
279+
default SLAB_FREELIST_HARDENED
280+
help
281+
Kernel heap attacks frequently depend on being able to create
282+
specifically-sized allocations with user-controlled contents
283+
that will be allocated into the same kmalloc bucket as a
284+
target object. To avoid sharing these allocation buckets,
285+
provide an explicitly separated set of buckets to be used for
286+
user-controlled allocations. This may very slightly increase
287+
memory fragmentation, though in practice it's only a handful
288+
of extra pages since the bulk of user-controlled allocations
289+
are relatively long-lived.
290+
291+
If unsure, say Y.
292+
276293
config SLUB_STATS
277294
default n
278295
bool "Enable performance statistics"

0 commit comments

Comments
 (0)