Skip to content

Commit 3822a7c

Browse files
committed
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Daniel Verkamp has contributed a memfd series ("mm/memfd: add F_SEAL_EXEC") which permits the setting of the memfd execute bit at memfd creation time, with the option of sealing the state of the X bit. - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare") which addresses a rare race condition related to PMD unsharing. - Several folioification patch serieses from Matthew Wilcox, Vishal Moola, Sidhartha Kumar and Lorenzo Stoakes - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which does perform some memcg maintenance and cleanup work. - SeongJae Park has added DAMOS filtering to DAMON, with the series "mm/damon/core: implement damos filter". These filters provide users with finer-grained control over DAMOS's actions. SeongJae has also done some DAMON cleanup work. - Kairui Song adds a series ("Clean up and fixes for swap"). - Vernon Yang contributed the series "Clean up and refinement for maple tree". - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It adds to MGLRU an LRU of memcgs, to improve the scalability of global reclaim. - David Hildenbrand has added some userfaultfd cleanup work in the series "mm: uffd-wp + change_protection() cleanups". - Christoph Hellwig has removed the generic_writepages() library function in the series "remove generic_writepages". - Baolin Wang has performed some maintenance on the compaction code in his series "Some small improvements for compaction". - Sidhartha Kumar is doing some maintenance work on struct page in his series "Get rid of tail page fields". - David Hildenbrand contributed some cleanup, bugfixing and generalization of pte management and of pte debugging in his series "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap PTEs". - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation flag in the series "Discard __GFP_ATOMIC". - Sergey Senozhatsky has improved zsmalloc's memory utilization with his series "zsmalloc: make zspage chain size configurable". - Joey Gouly has added prctl() support for prohibiting the creation of writeable+executable mappings. The previous BPF-based approach had shortcomings. See "mm: In-kernel support for memory-deny-write-execute (MDWE)". - Waiman Long did some kmemleak cleanup and bugfixing in the series "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF". - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series "mm: multi-gen LRU: improve". - Jiaqi Yan has provided some enhancements to our memory error statistics reporting, mainly by presenting the statistics on a per-node basis. See the series "Introduce per NUMA node memory error statistics". - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog regression in compaction via his series "Fix excessive CPU usage during compaction". - Christoph Hellwig does some vmalloc maintenance work in the series "cleanup vfree and vunmap". - Christoph Hellwig has removed block_device_operations.rw_page() in ths series "remove ->rw_page". - We get some maple_tree improvements and cleanups in Liam Howlett's series "VMA tree type safety and remove __vma_adjust()". - Suren Baghdasaryan has done some work on the maintainability of our vm_flags handling in the series "introduce vm_flags modifier functions". - Some pagemap cleanup and generalization work in Mike Rapoport's series "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and "fixups for generic implementation of pfn_valid()" - Baoquan He has done some work to make /proc/vmallocinfo and /proc/kcore better represent the real state of things in his series "mm/vmalloc.c: allow vread() to read out vm_map_ram areas". - Jason Gunthorpe rationalized the GUP system's interface to the rest of the kernel in the series "Simplify the external interface for GUP". - SeongJae Park wishes to migrate people from DAMON's debugfs interface over to its sysfs interface. To support this, we'll temporarily be printing warnings when people use the debugfs interface. See the series "mm/damon: deprecate DAMON debugfs interface". - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes and clean-ups" series. - Huang Ying has provided a dramatic reduction in migration's TLB flush IPI rates with the series "migrate_pages(): batch TLB flushing". - Arnd Bergmann has some objtool fixups in "objtool warning fixes". * tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits) include/linux/migrate.h: remove unneeded externs mm/memory_hotplug: cleanup return value handing in do_migrate_range() mm/uffd: fix comment in handling pte markers mm: change to return bool for isolate_movable_page() mm: hugetlb: change to return bool for isolate_hugetlb() mm: change to return bool for isolate_lru_page() mm: change to return bool for folio_isolate_lru() objtool: add UACCESS exceptions for __tsan_volatile_read/write kmsan: disable ftrace in kmsan core code kasan: mark addr_has_metadata __always_inline mm: memcontrol: rename memcg_kmem_enabled() sh: initialize max_mapnr m68k/nommu: add missing definition of ARCH_PFN_OFFSET mm: percpu: fix incorrect size in pcpu_obj_full_size() maple_tree: reduce stack usage with gcc-9 and earlier mm: page_alloc: call panic() when memoryless node allocation fails mm: multi-gen LRU: avoid futile retries migrate_pages: move THP/hugetlb migration support check to simplify code migrate_pages: batch flushing TLB migrate_pages: share more code between _unmap and _move ...
2 parents e4bc158 + f9366f4 commit 3822a7c

File tree

496 files changed

+11496
-6870
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

496 files changed

+11496
-6870
lines changed

Documentation/ABI/stable/sysfs-devices-node

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,3 +182,42 @@ Date: November 2021
182182
Contact: Jarkko Sakkinen <[email protected]>
183183
Description:
184184
The total amount of SGX physical memory in bytes.
185+
186+
What: /sys/devices/system/node/nodeX/memory_failure/total
187+
Date: January 2023
188+
Contact: Jiaqi Yan <[email protected]>
189+
Description:
190+
The total number of raw poisoned pages (pages containing
191+
corrupted data due to memory errors) on a NUMA node.
192+
193+
What: /sys/devices/system/node/nodeX/memory_failure/ignored
194+
Date: January 2023
195+
Contact: Jiaqi Yan <[email protected]>
196+
Description:
197+
Of the raw poisoned pages on a NUMA node, how many pages are
198+
ignored by memory error recovery attempt, usually because
199+
support for this type of pages is unavailable, and kernel
200+
gives up the recovery.
201+
202+
What: /sys/devices/system/node/nodeX/memory_failure/failed
203+
Date: January 2023
204+
Contact: Jiaqi Yan <[email protected]>
205+
Description:
206+
Of the raw poisoned pages on a NUMA node, how many pages are
207+
failed by memory error recovery attempt. This usually means
208+
a key recovery operation failed.
209+
210+
What: /sys/devices/system/node/nodeX/memory_failure/delayed
211+
Date: January 2023
212+
Contact: Jiaqi Yan <[email protected]>
213+
Description:
214+
Of the raw poisoned pages on a NUMA node, how many pages are
215+
delayed by memory error recovery attempt. Delayed poisoned
216+
pages usually will be retried by kernel.
217+
218+
What: /sys/devices/system/node/nodeX/memory_failure/recovered
219+
Date: January 2023
220+
Contact: Jiaqi Yan <[email protected]>
221+
Description:
222+
Of the raw poisoned pages on a NUMA node, how many pages are
223+
recovered by memory error recovery attempt.

Documentation/ABI/testing/sysfs-kernel-mm-damon

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -258,6 +258,35 @@ Contact: SeongJae Park <[email protected]>
258258
Description: Writing to and reading from this file sets and gets the low
259259
watermark of the scheme in permil.
260260

261+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/nr_filters
262+
Date: Dec 2022
263+
Contact: SeongJae Park <[email protected]>
264+
Description: Writing a number 'N' to this file creates the number of
265+
directories for setting filters of the scheme named '0' to
266+
'N-1' under the filters/ directory.
267+
268+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/type
269+
Date: Dec 2022
270+
Contact: SeongJae Park <[email protected]>
271+
Description: Writing to and reading from this file sets and gets the type of
272+
the memory of the interest. 'anon' for anonymous pages, or
273+
'memcg' for specific memory cgroup can be written and read.
274+
275+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/memcg_path
276+
Date: Dec 2022
277+
Contact: SeongJae Park <[email protected]>
278+
Description: If 'memcg' is written to the 'type' file, writing to and
279+
reading from this file sets and gets the path to the memory
280+
cgroup of the interest.
281+
282+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/filters/<F>/matching
283+
Date: Dec 2022
284+
Contact: SeongJae Park <[email protected]>
285+
Description: Writing 'Y' or 'N' to this file sets whether to filter out
286+
pages that do or do not match to the 'type' and 'memcg_path',
287+
respectively. Filter out means the action of the scheme will
288+
not be applied to.
289+
261290
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/stats/nr_tried
262291
Date: Mar 2022
263292
Contact: SeongJae Park <[email protected]>

Documentation/admin-guide/cgroup-v1/memory.rst

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ Brief summary of control files.
8787
memory.swappiness set/show swappiness parameter of vmscan
8888
(See sysctl's vm.swappiness)
8989
memory.move_charge_at_immigrate set/show controls of moving charges
90+
This knob is deprecated and shouldn't be
91+
used.
9092
memory.oom_control set/show oom controls.
9193
memory.numa_stat show the number of memory usage per numa
9294
node
@@ -727,8 +729,15 @@ If we want to change this to 1G, we can at any time use::
727729

728730
.. _cgroup-v1-memory-move-charges:
729731

730-
8. Move charges at task migration
731-
=================================
732+
8. Move charges at task migration (DEPRECATED!)
733+
===============================================
734+
735+
THIS IS DEPRECATED!
736+
737+
It's expensive and unreliable! It's better practice to launch workload
738+
tasks directly from inside their target cgroup. Use dedicated workload
739+
cgroups to allow fine-grained policy adjustments without having to
740+
move physical pages between control domains.
732741

733742
Users can move charges associated with a task along with task migration, that
734743
is, uncharge task's pages from the old cgroup and charge them to the new cgroup.

Documentation/admin-guide/mm/damon/reclaim.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,15 @@ The end physical address of memory region that DAMON_RECLAIM will do work
205205
against. That is, DAMON_RECLAIM will find cold memory regions in this region
206206
and reclaims. By default, biggest System RAM is used as the region.
207207

208+
skip_anon
209+
---------
210+
211+
Skip anonymous pages reclamation.
212+
213+
If this parameter is set as ``Y``, DAMON_RECLAIM does not reclaim anonymous
214+
pages. By default, ``N``.
215+
216+
208217
kdamond_pid
209218
-----------
210219

Documentation/admin-guide/mm/damon/usage.rst

Lines changed: 85 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,12 @@ DAMON provides below interfaces for different users.
2525
interface provides only simple :ref:`statistics <damos_stats>` for the
2626
monitoring results. For detailed monitoring results, DAMON provides a
2727
:ref:`tracepoint <tracepoint>`.
28-
- *debugfs interface.*
28+
- *debugfs interface. (DEPRECATED!)*
2929
:ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
30-
<sysfs_interface>`. This will be removed after next LTS kernel is released,
31-
so users should move to the :ref:`sysfs interface <sysfs_interface>`.
30+
<sysfs_interface>`. This is deprecated, so users should move to the
31+
:ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
32+
move, please report your usecase to [email protected] and
33+
3234
- *Kernel Space Programming Interface.*
3335
:doc:`This </mm/damon/api>` is for kernel space programmers. Using this,
3436
users can utilize every feature of DAMON most flexibly and efficiently by
@@ -87,6 +89,8 @@ comma (","). ::
8789
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
8890
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
8991
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
92+
│ │ │ │ │ │ │ filters/nr_filters
93+
│ │ │ │ │ │ │ │ 0/type,matching,memcg_id
9094
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
9195
│ │ │ │ │ │ │ tried_regions/
9296
│ │ │ │ │ │ │ │ 0/start,end,nr_accesses,age
@@ -151,6 +155,8 @@ number (``N``) to the file creates the number of child directories named as
151155
moment, only one context per kdamond is supported, so only ``0`` or ``1`` can
152156
be written to the file.
153157

158+
.. _sysfs_contexts:
159+
154160
contexts/<N>/
155161
-------------
156162

@@ -268,21 +274,32 @@ schemes/<N>/
268274
------------
269275

270276
In each scheme directory, five directories (``access_pattern``, ``quotas``,
271-
``watermarks``, ``stats``, and ``tried_regions``) and one file (``action``)
272-
exist.
277+
``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file
278+
(``action``) exist.
273279

274280
The ``action`` file is for setting and getting what action you want to apply to
275281
memory regions having specific access pattern of the interest. The keywords
276282
that can be written to and read from the file and their meaning are as below.
277283

278-
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``
279-
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``
280-
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
281-
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
282-
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
284+
Note that support of each action depends on the running DAMON operations set
285+
`implementation <sysfs_contexts>`.
286+
287+
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
288+
Supported by ``vaddr`` and ``fvaddr`` operations set.
289+
- ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
290+
Supported by ``vaddr`` and ``fvaddr`` operations set.
291+
- ``pageout``: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
292+
Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
293+
- ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
294+
Supported by ``vaddr`` and ``fvaddr`` operations set.
295+
- ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
296+
Supported by ``vaddr`` and ``fvaddr`` operations set.
283297
- ``lru_prio``: Prioritize the region on its LRU lists.
298+
Supported by ``paddr`` operations set.
284299
- ``lru_deprio``: Deprioritize the region on its LRU lists.
285-
- ``stat``: Do nothing but count the statistics
300+
Supported by ``paddr`` operations set.
301+
- ``stat``: Do nothing but count the statistics.
302+
Supported by all operations sets.
286303

287304
schemes/<N>/access_pattern/
288305
---------------------------
@@ -347,6 +364,46 @@ as below.
347364

348365
The ``interval`` should written in microseconds unit.
349366

367+
schemes/<N>/filters/
368+
--------------------
369+
370+
Users could know something more than the kernel for specific types of memory.
371+
In the case, users could do their own management for the memory and hence
372+
doesn't want DAMOS bothers that. Users could limit DAMOS by setting the access
373+
pattern of the scheme and/or the monitoring regions for the purpose, but that
374+
can be inefficient in some cases. In such cases, users could set non-access
375+
pattern driven filters using files in this directory.
376+
377+
In the beginning, this directory has only one file, ``nr_filters``. Writing a
378+
number (``N``) to the file creates the number of child directories named ``0``
379+
to ``N-1``. Each directory represents each filter. The filters are evaluated
380+
in the numeric order.
381+
382+
Each filter directory contains three files, namely ``type``, ``matcing``, and
383+
``memcg_path``. You can write one of two special keywords, ``anon`` for
384+
anonymous pages, or ``memcg`` for specific memory cgroup filtering. In case of
385+
the memory cgroup filtering, you can specify the memory cgroup of the interest
386+
by writing the path of the memory cgroup from the cgroups mount point to
387+
``memcg_path`` file. You can write ``Y`` or ``N`` to ``matching`` file to
388+
filter out pages that does or does not match to the type, respectively. Then,
389+
the scheme's action will not be applied to the pages that specified to be
390+
filtered out.
391+
392+
For example, below restricts a DAMOS action to be applied to only non-anonymous
393+
pages of all memory cgroups except ``/having_care_already``.::
394+
395+
# echo 2 > nr_filters
396+
# # filter out anonymous pages
397+
echo anon > 0/type
398+
echo Y > 0/matching
399+
# # further filter out all cgroups except one at '/having_care_already'
400+
echo memcg > 1/type
401+
echo /having_care_already > 1/memcg_path
402+
echo N > 1/matching
403+
404+
Note that filters are currently supported only when ``paddr``
405+
`implementation <sysfs_contexts>` is being used.
406+
350407
.. _sysfs_schemes_stats:
351408

352409
schemes/<N>/stats/
@@ -432,13 +489,17 @@ the files as above. Above is only for an example.
432489

433490
.. _debugfs_interface:
434491

435-
debugfs Interface
436-
=================
492+
debugfs Interface (DEPRECATED!)
493+
===============================
437494

438495
.. note::
439496

440-
DAMON debugfs interface will be removed after next LTS kernel is released, so
441-
users should move to the :ref:`sysfs interface <sysfs_interface>`.
497+
THIS IS DEPRECATED!
498+
499+
DAMON debugfs interface is deprecated, so users should move to the
500+
:ref:`sysfs interface <sysfs_interface>`. If you depend on this and cannot
501+
move, please report your usecase to [email protected] and
502+
442503

443504
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
444505
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
@@ -574,11 +635,15 @@ The ``<action>`` is a predefined integer for memory management actions, which
574635
DAMON will apply to the regions having the target access pattern. The
575636
supported numbers and their meanings are as below.
576637

577-
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
578-
- 1: Call ``madvise()`` for the region with ``MADV_COLD``
579-
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``
580-
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``
581-
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
638+
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
639+
``target`` is ``paddr``.
640+
- 1: Call ``madvise()`` for the region with ``MADV_COLD``. Ignored if
641+
``target`` is ``paddr``.
642+
- 2: Call ``madvise()`` for the region with ``MADV_PAGEOUT``.
643+
- 3: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. Ignored if
644+
``target`` is ``paddr``.
645+
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. Ignored if
646+
``target`` is ``paddr``.
582647
- 5: Do nothing but count the statistics
583648

584649
Quota

Documentation/admin-guide/mm/hugetlbpage.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -459,13 +459,13 @@ Examples
459459
.. _map_hugetlb:
460460

461461
``map_hugetlb``
462-
see tools/testing/selftests/vm/map_hugetlb.c
462+
see tools/testing/selftests/mm/map_hugetlb.c
463463

464464
``hugepage-shm``
465-
see tools/testing/selftests/vm/hugepage-shm.c
465+
see tools/testing/selftests/mm/hugepage-shm.c
466466

467467
``hugepage-mmap``
468-
see tools/testing/selftests/vm/hugepage-mmap.c
468+
see tools/testing/selftests/mm/hugepage-mmap.c
469469

470470
The `libhugetlbfs`_ library provides a wide range of userspace tools
471471
to help with huge page usability, environment setup, and control.

Documentation/admin-guide/mm/idle_page_tracking.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ workload one should:
6363
are not reclaimable, he or she can filter them out using
6464
``/proc/kpageflags``.
6565

66-
The page-types tool in the tools/vm directory can be used to assist in this.
66+
The page-types tool in the tools/mm directory can be used to assist in this.
6767
If the tool is run initially with the appropriate option, it will mark all the
6868
queried pages as idle. Subsequent runs of the tool can then show which pages have
6969
their idle flag cleared in the interim.

Documentation/admin-guide/mm/numaperf.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1-
=============
1+
=======================
2+
NUMA Memory Performance
3+
=======================
4+
25
NUMA Locality
36
=============
47

@@ -59,7 +62,6 @@ that are CPUs and hence suitable for generic task scheduling, and
5962
IO initiators such as GPUs and NICs. Unlike access class 0, only
6063
nodes containing CPUs are considered.
6164

62-
================
6365
NUMA Performance
6466
================
6567

@@ -94,7 +96,6 @@ for the platform.
9496
Access class 1 takes the same form but only includes values for CPU to
9597
memory activity.
9698

97-
==========
9899
NUMA Cache
99100
==========
100101

@@ -168,7 +169,6 @@ The "size" is the number of bytes provided by this cache level.
168169
The "write_policy" will be 0 for write-back, and non-zero for
169170
write-through caching.
170171

171-
========
172172
See Also
173173
========
174174

Documentation/admin-guide/mm/pagemap.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ There are four components to pagemap:
4444
* ``/proc/kpagecount``. This file contains a 64-bit count of the number of
4545
times each page is mapped, indexed by PFN.
4646

47-
The page-types tool in the tools/vm directory can be used to query the
47+
The page-types tool in the tools/mm directory can be used to query the
4848
number of times a page is mapped.
4949

5050
* ``/proc/kpageflags``. This file contains a 64-bit set of flags for each
@@ -170,7 +170,7 @@ LRU related page flags
170170
14 - SWAPBACKED
171171
The page is backed by swap/RAM.
172172

173-
The page-types tool in the tools/vm directory can be used to query the
173+
The page-types tool in the tools/mm directory can be used to query the
174174
above flags.
175175

176176
Using pagemap to do something useful

0 commit comments

Comments
 (0)