Skip to content

Commit 27bc50f

Browse files
committed
Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in linux-next for a couple of months without, to my knowledge, any negative reports (or any positive ones, come to that). - Also the Maple Tree from Liam Howlett. An overlapping range-based tree for vmas. It it apparently slightly more efficient in its own right, but is mainly targeted at enabling work to reduce mmap_lock contention. Liam has identified a number of other tree users in the kernel which could be beneficially onverted to mapletrees. Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat at [1]. This has yet to be addressed due to Liam's unfortunately timed vacation. He is now back and we'll get this fixed up. - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses clang-generated instrumentation to detect used-unintialized bugs down to the single bit level. KMSAN keeps finding bugs. New ones, as well as the legacy ones. - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of memory into THPs. - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to support file/shmem-backed pages. - userfaultfd updates from Axel Rasmussen - zsmalloc cleanups from Alexey Romanov - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and memory-failure - Huang Ying adds enhancements to NUMA balancing memory tiering mode's page promotion, with a new way of detecting hot pages. - memcg updates from Shakeel Butt: charging optimizations and reduced memory consumption. - memcg cleanups from Kairui Song. - memcg fixes and cleanups from Johannes Weiner. - Vishal Moola provides more folio conversions - Zhang Yi removed ll_rw_block() :( - migration enhancements from Peter Xu - migration error-path bugfixes from Huang Ying - Aneesh Kumar added ability for a device driver to alter the memory tiering promotion paths. For optimizations by PMEM drivers, DRM drivers, etc. - vma merging improvements from Jakub Matěn. - NUMA hinting cleanups from David Hildenbrand. - xu xin added aditional userspace visibility into KSM merging activity. - THP & KSM code consolidation from Qi Zheng. - more folio work from Matthew Wilcox. - KASAN updates from Andrey Konovalov. - DAMON cleanups from Kaixu Xia. - DAMON work from SeongJae Park: fixes, cleanups. - hugetlb sysfs cleanups from Muchun Song. - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core. Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1] * tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits) hugetlb: allocate vma lock for all sharable vmas hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer hugetlb: fix vma lock handling during split vma and range unmapping mglru: mm/vmscan.c: fix imprecise comments mm/mglru: don't sync disk for each aging cycle mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol mm: memcontrol: use do_memsw_account() in a few more places mm: memcontrol: deprecate swapaccounting=0 mode mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled mm/secretmem: remove reduntant return value mm/hugetlb: add available_huge_pages() func mm: remove unused inline functions from include/linux/mm_inline.h selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd selftests/vm: add thp collapse shmem testing selftests/vm: add thp collapse file and tmpfs testing selftests/vm: modularize thp collapse memory operations selftests/vm: dedup THP helpers mm/khugepaged: add tracepoint to hpage_collapse_scan_file() mm/madvise: add file and shmem support to MADV_COLLAPSE ...
2 parents 70442fc + bbff39c commit 27bc50f

File tree

409 files changed

+65792
-8034
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

409 files changed

+65792
-8034
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
What: /sys/devices/virtual/memory_tiering/
2+
Date: August 2022
3+
Contact: Linux memory management mailing list <[email protected]>
4+
Description: A collection of all the memory tiers allocated.
5+
6+
Individual memory tier details are contained in subdirectories
7+
named by the abstract distance of the memory tier.
8+
9+
/sys/devices/virtual/memory_tiering/memory_tierN/
10+
11+
12+
What: /sys/devices/virtual/memory_tiering/memory_tierN/
13+
/sys/devices/virtual/memory_tiering/memory_tierN/nodes
14+
Date: August 2022
15+
Contact: Linux memory management mailing list <[email protected]>
16+
Description: Directory with details of a specific memory tier
17+
18+
This is the directory containing information about a particular
19+
memory tier, memtierN, where N is derived based on abstract distance.
20+
21+
A smaller value of N implies a higher (faster) memory tier in the
22+
hierarchy.
23+
24+
nodes: NUMA nodes that are part of this memory tier.
25+

Documentation/accounting/delay-accounting.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ a) waiting for a CPU (while being runnable)
1313
b) completion of synchronous block I/O initiated by the task
1414
c) swapping in pages
1515
d) memory reclaim
16-
e) thrashing page cache
16+
e) thrashing
1717
f) direct compact
1818
g) write-protect copy
1919

Documentation/admin-guide/cgroup-v1/memory.rst

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -299,7 +299,7 @@ Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
299299
lruvec->lru_lock; PG_lru bit of page->flags is cleared before
300300
isolating a page from its LRU under lruvec->lru_lock.
301301

302-
2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
302+
2.7 Kernel Memory Extension
303303
-----------------------------------------------
304304

305305
With the Kernel memory extension, the Memory Controller is able to limit
@@ -386,8 +386,6 @@ U != 0, K >= U:
386386

387387
a. Enable CONFIG_CGROUPS
388388
b. Enable CONFIG_MEMCG
389-
c. Enable CONFIG_MEMCG_SWAP (to use swap extension)
390-
d. Enable CONFIG_MEMCG_KMEM (to use kmem extension)
391389

392390
3.1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
393391
-------------------------------------------------------------------

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1469,6 +1469,14 @@
14691469
Permit 'security.evm' to be updated regardless of
14701470
current integrity status.
14711471

1472+
early_page_ext [KNL] Enforces page_ext initialization to earlier
1473+
stages so cover more early boot allocations.
1474+
Please note that as side effect some optimizations
1475+
might be disabled to achieve that (e.g. parallelized
1476+
memory initialization is disabled) so the boot process
1477+
might take longer, especially on systems with a lot of
1478+
memory. Available with CONFIG_PAGE_EXTENSION=y.
1479+
14721480
failslab=
14731481
fail_usercopy=
14741482
fail_page_alloc=
@@ -6041,12 +6049,6 @@
60416049
This parameter controls use of the Protected
60426050
Execution Facility on pSeries.
60436051

6044-
swapaccount= [KNL]
6045-
Format: [0|1]
6046-
Enable accounting of swap in memory resource
6047-
controller if no parameter or 1 is given or disable
6048-
it if 0 is given (See Documentation/admin-guide/cgroup-v1/memory.rst)
6049-
60506052
swiotlb= [ARM,IA-64,PPC,MIPS,X86]
60516053
Format: { <int> [,<int>] | force | noforce }
60526054
<int> -- Number of I/O TLB slabs

Documentation/admin-guide/mm/cma_debugfs.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ CMA Debugfs Interface
55
The CMA debugfs interface is useful to retrieve basic information out of the
66
different CMA areas and to test allocation/release in each of the areas.
77

8-
Each CMA zone represents a directory under <debugfs>/cma/, indexed by the
9-
kernel's CMA index. So the first CMA zone would be:
8+
Each CMA area represents a directory under <debugfs>/cma/, represented by
9+
its CMA name like below:
1010

11-
<debugfs>/cma/cma-0
11+
<debugfs>/cma/<cma_name>
1212

1313
The structure of the files created under that directory is as follows:
1414

@@ -18,8 +18,8 @@ The structure of the files created under that directory is as follows:
1818
- [RO] bitmap: The bitmap of page states in the zone.
1919
- [WO] alloc: Allocate N pages from that CMA area. For example::
2020

21-
echo 5 > <debugfs>/cma/cma-2/alloc
21+
echo 5 > <debugfs>/cma/<cma_name>/alloc
2222

23-
would try to allocate 5 pages from the cma-2 area.
23+
would try to allocate 5 pages from the 'cma_name' area.
2424

2525
- [WO] free: Free N pages from that CMA area, similar to the above.

Documentation/admin-guide/mm/damon/index.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
.. SPDX-License-Identifier: GPL-2.0
22
3-
========================
4-
Monitoring Data Accesses
5-
========================
3+
==========================
4+
DAMON: Data Access MONitor
5+
==========================
66

77
:doc:`DAMON </mm/damon/index>` allows light-weight data access monitoring.
88
Using DAMON, users can analyze the memory access patterns of their systems and

Documentation/admin-guide/mm/damon/start.rst

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,16 +29,9 @@ called DAMON Operator (DAMO). It is available at
2929
https://github.com/awslabs/damo. The examples below assume that ``damo`` is on
3030
your ``$PATH``. It's not mandatory, though.
3131

32-
Because DAMO is using the debugfs interface (refer to :doc:`usage` for the
33-
detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as
34-
below::
35-
36-
# mount -t debugfs none /sys/kernel/debug/
37-
38-
or append the following line to your ``/etc/fstab`` file so that your system
39-
can automatically mount debugfs upon booting::
40-
41-
debugfs /sys/kernel/debug debugfs defaults 0 0
32+
Because DAMO is using the sysfs interface (refer to :doc:`usage` for the
33+
detail) of DAMON, you should ensure :doc:`sysfs </filesystems/sysfs>` is
34+
mounted.
4235

4336

4437
Recording Data Access Patterns

Documentation/admin-guide/mm/damon/usage.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,6 +393,11 @@ the files as above. Above is only for an example.
393393
debugfs Interface
394394
=================
395395

396+
.. note::
397+
398+
DAMON debugfs interface will be removed after next LTS kernel is released, so
399+
users should move to the :ref:`sysfs interface <sysfs_interface>`.
400+
396401
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
397402
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
398403
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.

Documentation/admin-guide/mm/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ the Linux memory management.
3232
idle_page_tracking
3333
ksm
3434
memory-hotplug
35+
multigen_lru
3536
nommu-mmap
3637
numa_memory_policy
3738
numaperf

Documentation/admin-guide/mm/ksm.rst

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,42 @@ The maximum possible ``pages_sharing/pages_shared`` ratio is limited by the
184184
``max_page_sharing`` tunable. To increase the ratio ``max_page_sharing`` must
185185
be increased accordingly.
186186

187+
Monitoring KSM profit
188+
=====================
189+
190+
KSM can save memory by merging identical pages, but also can consume
191+
additional memory, because it needs to generate a number of rmap_items to
192+
save each scanned page's brief rmap information. Some of these pages may
193+
be merged, but some may not be abled to be merged after being checked
194+
several times, which are unprofitable memory consumed.
195+
196+
1) How to determine whether KSM save memory or consume memory in system-wide
197+
range? Here is a simple approximate calculation for reference::
198+
199+
general_profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
200+
sizeof(rmap_item);
201+
202+
where all_rmap_items can be easily obtained by summing ``pages_sharing``,
203+
``pages_shared``, ``pages_unshared`` and ``pages_volatile``.
204+
205+
2) The KSM profit inner a single process can be similarly obtained by the
206+
following approximate calculation::
207+
208+
process_profit =~ ksm_merging_pages * sizeof(page) -
209+
ksm_rmap_items * sizeof(rmap_item).
210+
211+
where ksm_merging_pages is shown under the directory ``/proc/<pid>/``,
212+
and ksm_rmap_items is shown in ``/proc/<pid>/ksm_stat``.
213+
214+
From the perspective of application, a high ratio of ``ksm_rmap_items`` to
215+
``ksm_merging_pages`` means a bad madvise-applied policy, so developers or
216+
administrators have to rethink how to change madvise policy. Giving an example
217+
for reference, a page's size is usually 4K, and the rmap_item's size is
218+
separately 32B on 32-bit CPU architecture and 64B on 64-bit CPU architecture.
219+
so if the ``ksm_rmap_items/ksm_merging_pages`` ratio exceeds 64 on 64-bit CPU
220+
or exceeds 128 on 32-bit CPU, then the app's madvise policy should be dropped,
221+
because the ksm profit is approximately zero or negative.
222+
187223
Monitoring KSM events
188224
=====================
189225

0 commit comments

Comments
 (0)