Skip to content

Commit 1472690

Browse files
committed
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: "173 patches. Subsystems affected by this series: ia64, ocfs2, block, and mm (debug, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, bootmem, sparsemem, vmalloc, kasan, pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, compaction, mempolicy, memblock, oom-kill, migration, ksm, percpu, vmstat, and madvise)" * emailed patches from Andrew Morton <[email protected]>: (173 commits) mm/madvise: add MADV_WILLNEED to process_madvise() mm/vmstat: remove unneeded return value mm/vmstat: simplify the array size calculation mm/vmstat: correct some wrong comments mm/percpu,c: remove obsolete comments of pcpu_chunk_populated() selftests: vm: add COW time test for KSM pages selftests: vm: add KSM merging time test mm: KSM: fix data type selftests: vm: add KSM merging across nodes test selftests: vm: add KSM zero page merging test selftests: vm: add KSM unmerge test selftests: vm: add KSM merge test mm/migrate: correct kernel-doc notation mm: wire up syscall process_mrelease mm: introduce process_mrelease system call memblock: make memblock_find_in_range method private mm/mempolicy.c: use in_task() in mempolicy_slab_node() mm/mempolicy: unify the create() func for bind/interleave/prefer-many policies mm/mempolicy: advertise new MPOL_PREFERRED_MANY mm/hugetlb: add support for mempolicy MPOL_PREFERRED_MANY ...
2 parents a9c9a6f + d5fffc5 commit 1472690

File tree

171 files changed

+3525
-1729
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

171 files changed

+3525
-1729
lines changed
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
What: /sys/kernel/mm/numa/
2+
Date: June 2021
3+
Contact: Linux memory management mailing list <[email protected]>
4+
Description: Interface for NUMA
5+
6+
What: /sys/kernel/mm/numa/demotion_enabled
7+
Date: June 2021
8+
Contact: Linux memory management mailing list <[email protected]>
9+
Description: Enable/disable demoting pages during reclaim
10+
11+
Page migration during reclaim is intended for systems
12+
with tiered memory configurations. These systems have
13+
multiple types of memory with varied performance
14+
characteristics instead of plain NUMA systems where
15+
the same kind of memory is found at varied distances.
16+
Allowing page migration during reclaim enables these
17+
systems to migrate pages from fast tiers to slow tiers
18+
when the fast tier is under pressure. This migration
19+
is performed before swap. It may move data to a NUMA
20+
node that does not fall into the cpuset of the
21+
allocating process which might be construed to violate
22+
the guarantees of cpusets. This should not be enabled
23+
on systems which need strict cpuset location
24+
guarantees.

Documentation/admin-guide/mm/numa_memory_policy.rst

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -245,6 +245,13 @@ MPOL_INTERLEAVED
245245
address range or file. During system boot up, the temporary
246246
interleaved system default policy works in this mode.
247247

248+
MPOL_PREFERRED_MANY
249+
This mode specifices that the allocation should be preferrably
250+
satisfied from the nodemask specified in the policy. If there is
251+
a memory pressure on all nodes in the nodemask, the allocation
252+
can fall back to all existing numa nodes. This is effectively
253+
MPOL_PREFERRED allowed for a mask rather than a single node.
254+
248255
NUMA memory policy supports the following optional mode flags:
249256

250257
MPOL_F_STATIC_NODES
@@ -253,10 +260,10 @@ MPOL_F_STATIC_NODES
253260
nodes changes after the memory policy has been defined.
254261

255262
Without this flag, any time a mempolicy is rebound because of a
256-
change in the set of allowed nodes, the node (Preferred) or
257-
nodemask (Bind, Interleave) is remapped to the new set of
258-
allowed nodes. This may result in nodes being used that were
259-
previously undesired.
263+
change in the set of allowed nodes, the preferred nodemask (Preferred
264+
Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
265+
remapped to the new set of allowed nodes. This may result in nodes
266+
being used that were previously undesired.
260267

261268
With this flag, if the user-specified nodes overlap with the
262269
nodes allowed by the task's cpuset, then the memory policy is

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,8 @@ compaction_proactiveness
118118

119119
This tunable takes a value in the range [0, 100] with a default value of
120120
20. This tunable determines how aggressively compaction is done in the
121-
background. Setting it to 0 disables proactive compaction.
121+
background. Write of a non zero value to this tunable will immediately
122+
trigger the proactive compaction. Setting it to 0 disables proactive compaction.
122123

123124
Note that compaction has a non-trivial system-wide impact as pages
124125
belonging to different processes are moved around, which could also lead

Documentation/core-api/cachetlb.rst

Lines changed: 37 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -271,10 +271,15 @@ maps this page at its virtual address.
271271

272272
``void flush_dcache_page(struct page *page)``
273273

274-
Any time the kernel writes to a page cache page, _OR_
275-
the kernel is about to read from a page cache page and
276-
user space shared/writable mappings of this page potentially
277-
exist, this routine is called.
274+
This routines must be called when:
275+
276+
a) the kernel did write to a page that is in the page cache page
277+
and / or in high memory
278+
b) the kernel is about to read from a page cache page and user space
279+
shared/writable mappings of this page potentially exist. Note
280+
that {get,pin}_user_pages{_fast} already call flush_dcache_page
281+
on any page found in the user address space and thus driver
282+
code rarely needs to take this into account.
278283

279284
.. note::
280285

@@ -284,38 +289,34 @@ maps this page at its virtual address.
284289
handling vfs symlinks in the page cache need not call
285290
this interface at all.
286291

287-
The phrase "kernel writes to a page cache page" means,
288-
specifically, that the kernel executes store instructions
289-
that dirty data in that page at the page->virtual mapping
290-
of that page. It is important to flush here to handle
291-
D-cache aliasing, to make sure these kernel stores are
292-
visible to user space mappings of that page.
293-
294-
The corollary case is just as important, if there are users
295-
which have shared+writable mappings of this file, we must make
296-
sure that kernel reads of these pages will see the most recent
297-
stores done by the user.
298-
299-
If D-cache aliasing is not an issue, this routine may
300-
simply be defined as a nop on that architecture.
301-
302-
There is a bit set aside in page->flags (PG_arch_1) as
303-
"architecture private". The kernel guarantees that,
304-
for pagecache pages, it will clear this bit when such
305-
a page first enters the pagecache.
306-
307-
This allows these interfaces to be implemented much more
308-
efficiently. It allows one to "defer" (perhaps indefinitely)
309-
the actual flush if there are currently no user processes
310-
mapping this page. See sparc64's flush_dcache_page and
311-
update_mmu_cache implementations for an example of how to go
312-
about doing this.
313-
314-
The idea is, first at flush_dcache_page() time, if
315-
page->mapping->i_mmap is an empty tree, just mark the architecture
316-
private page flag bit. Later, in update_mmu_cache(), a check is
317-
made of this flag bit, and if set the flush is done and the flag
318-
bit is cleared.
292+
The phrase "kernel writes to a page cache page" means, specifically,
293+
that the kernel executes store instructions that dirty data in that
294+
page at the page->virtual mapping of that page. It is important to
295+
flush here to handle D-cache aliasing, to make sure these kernel stores
296+
are visible to user space mappings of that page.
297+
298+
The corollary case is just as important, if there are users which have
299+
shared+writable mappings of this file, we must make sure that kernel
300+
reads of these pages will see the most recent stores done by the user.
301+
302+
If D-cache aliasing is not an issue, this routine may simply be defined
303+
as a nop on that architecture.
304+
305+
There is a bit set aside in page->flags (PG_arch_1) as "architecture
306+
private". The kernel guarantees that, for pagecache pages, it will
307+
clear this bit when such a page first enters the pagecache.
308+
309+
This allows these interfaces to be implemented much more efficiently.
310+
It allows one to "defer" (perhaps indefinitely) the actual flush if
311+
there are currently no user processes mapping this page. See sparc64's
312+
flush_dcache_page and update_mmu_cache implementations for an example
313+
of how to go about doing this.
314+
315+
The idea is, first at flush_dcache_page() time, if page_file_mapping()
316+
returns a mapping, and mapping_mapped on that mapping returns %false,
317+
just mark the architecture private page flag bit. Later, in
318+
update_mmu_cache(), a check is made of this flag bit, and if set the
319+
flush is done and the flag bit is cleared.
319320

320321
.. important::
321322

@@ -351,19 +352,6 @@ maps this page at its virtual address.
351352
architectures). For incoherent architectures, it should flush
352353
the cache of the page at vmaddr.
353354

354-
``void flush_kernel_dcache_page(struct page *page)``
355-
356-
When the kernel needs to modify a user page is has obtained
357-
with kmap, it calls this function after all modifications are
358-
complete (but before kunmapping it) to bring the underlying
359-
page up to date. It is assumed here that the user has no
360-
incoherent cached copies (i.e. the original page was obtained
361-
from a mechanism like get_user_pages()). The default
362-
implementation is a nop and should remain so on all coherent
363-
architectures. On incoherent architectures, this should flush
364-
the kernel cache for page (using page_address(page)).
365-
366-
367355
``void flush_icache_range(unsigned long start, unsigned long end)``
368356

369357
When the kernel stores into addresses that it will execute

Documentation/dev-tools/kasan.rst

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -181,9 +181,16 @@ By default, KASAN prints a bug report only for the first invalid memory access.
181181
With ``kasan_multi_shot``, KASAN prints a report on every invalid access. This
182182
effectively disables ``panic_on_warn`` for KASAN reports.
183183

184+
Alternatively, independent of ``panic_on_warn`` the ``kasan.fault=`` boot
185+
parameter can be used to control panic and reporting behaviour:
186+
187+
- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
188+
report or also panic the kernel (default: ``report``). The panic happens even
189+
if ``kasan_multi_shot`` is enabled.
190+
184191
Hardware tag-based KASAN mode (see the section about various modes below) is
185192
intended for use in production as a security mitigation. Therefore, it supports
186-
boot parameters that allow disabling KASAN or controlling its features.
193+
additional boot parameters that allow disabling KASAN or controlling features:
187194

188195
- ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).
189196

@@ -199,10 +206,6 @@ boot parameters that allow disabling KASAN or controlling its features.
199206
- ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
200207
traces collection (default: ``on``).
201208

202-
- ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
203-
report or also panic the kernel (default: ``report``). The panic happens even
204-
if ``kasan_multi_shot`` is enabled.
205-
206209
Implementation details
207210
----------------------
208211

Documentation/translations/zh_CN/core-api/cachetlb.rst

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -298,15 +298,6 @@ HyperSparc cpu就是这样一个具有这种属性的cpu。
298298
用。默认的实现是nop(对于所有相干的架构应该保持这样)。对于不一致性
299299
的架构,它应该刷新vmaddr处的页面缓存。
300300

301-
``void flush_kernel_dcache_page(struct page *page)``
302-
303-
当内核需要修改一个用kmap获得的用户页时,它会在所有修改完成后(但在
304-
kunmapping之前)调用这个函数,以使底层页面达到最新状态。这里假定用
305-
户没有不一致性的缓存副本(即原始页面是从类似get_user_pages()的机制
306-
中获得的)。默认的实现是一个nop,在所有相干的架构上都应该如此。在不
307-
一致性的架构上,这应该刷新内核缓存中的页面(使用page_address(page))。
308-
309-
310301
``void flush_icache_range(unsigned long start, unsigned long end)``
311302

312303
当内核存储到它将执行的地址中时(例如在加载模块时),这个函数被调用。

Documentation/vm/hwpoison.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,6 @@ Limitations
180180
===========
181181
- Not all page types are supported and never will. Most kernel internal
182182
objects cannot be recovered, only LRU pages for now.
183-
- Right now hugepage support is missing.
184183

185184
---
186185
Andi Kleen, Oct 2009

arch/alpha/kernel/syscalls/syscall.tbl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -486,3 +486,5 @@
486486
554 common landlock_create_ruleset sys_landlock_create_ruleset
487487
555 common landlock_add_rule sys_landlock_add_rule
488488
556 common landlock_restrict_self sys_landlock_restrict_self
489+
# 557 reserved for memfd_secret
490+
558 common process_mrelease sys_process_mrelease

arch/arm/include/asm/cacheflush.h

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
291291
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
292292
extern void flush_dcache_page(struct page *);
293293

294+
#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
294295
static inline void flush_kernel_vmap_range(void *addr, int size)
295296
{
296297
if ((cache_is_vivt() || cache_is_vipt_aliasing()))
@@ -312,9 +313,6 @@ static inline void flush_anon_page(struct vm_area_struct *vma,
312313
__flush_anon_page(vma, page, vmaddr);
313314
}
314315

315-
#define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
316-
extern void flush_kernel_dcache_page(struct page *);
317-
318316
#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
319317
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
320318

arch/arm/kernel/setup.c

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1012,31 +1012,25 @@ static void __init reserve_crashkernel(void)
10121012
unsigned long long lowmem_max = __pa(high_memory - 1) + 1;
10131013
if (crash_max > lowmem_max)
10141014
crash_max = lowmem_max;
1015-
crash_base = memblock_find_in_range(CRASH_ALIGN, crash_max,
1016-
crash_size, CRASH_ALIGN);
1015+
1016+
crash_base = memblock_phys_alloc_range(crash_size, CRASH_ALIGN,
1017+
CRASH_ALIGN, crash_max);
10171018
if (!crash_base) {
10181019
pr_err("crashkernel reservation failed - No suitable area found.\n");
10191020
return;
10201021
}
10211022
} else {
1023+
unsigned long long crash_max = crash_base + crash_size;
10221024
unsigned long long start;
10231025

1024-
start = memblock_find_in_range(crash_base,
1025-
crash_base + crash_size,
1026-
crash_size, SECTION_SIZE);
1027-
if (start != crash_base) {
1026+
start = memblock_phys_alloc_range(crash_size, SECTION_SIZE,
1027+
crash_base, crash_max);
1028+
if (!start) {
10281029
pr_err("crashkernel reservation failed - memory is in use.\n");
10291030
return;
10301031
}
10311032
}
10321033

1033-
ret = memblock_reserve(crash_base, crash_size);
1034-
if (ret < 0) {
1035-
pr_warn("crashkernel reservation failed - memory is in use (0x%lx)\n",
1036-
(unsigned long)crash_base);
1037-
return;
1038-
}
1039-
10401034
pr_info("Reserving %ldMB of memory at %ldMB for crashkernel (System RAM: %ldMB)\n",
10411035
(unsigned long)(crash_size >> 20),
10421036
(unsigned long)(crash_base >> 20),

0 commit comments

Comments
 (0)