Skip to content

Commit ee01c4d

Browse files
committed
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...
2 parents c444eb5 + 09587a0 commit ee01c4d

File tree

147 files changed

+3443
-2670
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+3443
-2670
lines changed

Documentation/admin-guide/cgroup-v1/memory.rst

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -199,11 +199,11 @@ An RSS page is unaccounted when it's fully unmapped. A PageCache page is
199199
unaccounted when it's removed from radix-tree. Even if RSS pages are fully
200200
unmapped (by kswapd), they may exist as SwapCache in the system until they
201201
are really freed. Such SwapCaches are also accounted.
202-
A swapped-in page is not accounted until it's mapped.
202+
A swapped-in page is accounted after adding into swapcache.
203203

204204
Note: The kernel does swapin-readahead and reads multiple swaps at once.
205-
This means swapped-in pages may contain pages for other tasks than a task
206-
causing page fault. So, we avoid accounting at swap-in I/O.
205+
Since page's memcg recorded into swap whatever memsw enabled, the page will
206+
be accounted after swapin.
207207

208208
At page migration, accounting information is kept.
209209

@@ -222,18 +222,13 @@ the cgroup that brought it in -- this will happen on memory pressure).
222222
But see section 8.2: when moving a task to another cgroup, its pages may
223223
be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
224224

225-
Exception: If CONFIG_MEMCG_SWAP is not used.
226-
When you do swapoff and make swapped-out pages of shmem(tmpfs) to
227-
be backed into memory in force, charges for pages are accounted against the
228-
caller of swapoff rather than the users of shmem.
229-
230-
2.4 Swap Extension (CONFIG_MEMCG_SWAP)
225+
2.4 Swap Extension
231226
--------------------------------------
232227

233-
Swap Extension allows you to record charge for swap. A swapped-in page is
234-
charged back to original page allocator if possible.
228+
Swap usage is always recorded for each of cgroup. Swap Extension allows you to
229+
read and limit it.
235230

236-
When swap is accounted, following files are added.
231+
When CONFIG_SWAP is enabled, following files are added.
237232

238233
- memory.memsw.usage_in_bytes.
239234
- memory.memsw.limit_in_bytes.

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 27 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -834,12 +834,15 @@
834834
See also Documentation/networking/decnet.rst.
835835

836836
default_hugepagesz=
837-
[same as hugepagesz=] The size of the default
838-
HugeTLB page size. This is the size represented by
839-
the legacy /proc/ hugepages APIs, used for SHM, and
840-
default size when mounting hugetlbfs filesystems.
841-
Defaults to the default architecture's huge page size
842-
if not specified.
837+
[HW] The size of the default HugeTLB page. This is
838+
the size represented by the legacy /proc/ hugepages
839+
APIs. In addition, this is the default hugetlb size
840+
used for shmget(), mmap() and mounting hugetlbfs
841+
filesystems. If not specified, defaults to the
842+
architecture's default huge page size. Huge page
843+
sizes are architecture dependent. See also
844+
Documentation/admin-guide/mm/hugetlbpage.rst.
845+
Format: size[KMG]
843846

844847
deferred_probe_timeout=
845848
[KNL] Debugging option to set a timeout in seconds for
@@ -1484,13 +1487,24 @@
14841487
hugepages using the cma allocator. If enabled, the
14851488
boot-time allocation of gigantic hugepages is skipped.
14861489

1487-
hugepages= [HW,X86-32,IA-64] HugeTLB pages to allocate at boot.
1488-
hugepagesz= [HW,IA-64,PPC,X86-64] The size of the HugeTLB pages.
1489-
On x86-64 and powerpc, this option can be specified
1490-
multiple times interleaved with hugepages= to reserve
1491-
huge pages of different sizes. Valid pages sizes on
1492-
x86-64 are 2M (when the CPU supports "pse") and 1G
1493-
(when the CPU supports the "pdpe1gb" cpuinfo flag).
1490+
hugepages= [HW] Number of HugeTLB pages to allocate at boot.
1491+
If this follows hugepagesz (below), it specifies
1492+
the number of pages of hugepagesz to be allocated.
1493+
If this is the first HugeTLB parameter on the command
1494+
line, it specifies the number of pages to allocate for
1495+
the default huge page size. See also
1496+
Documentation/admin-guide/mm/hugetlbpage.rst.
1497+
Format: <integer>
1498+
1499+
hugepagesz=
1500+
[HW] The size of the HugeTLB pages. This is used in
1501+
conjunction with hugepages (above) to allocate huge
1502+
pages of a specific size at boot. The pair
1503+
hugepagesz=X hugepages=Y can be specified once for
1504+
each supported huge page size. Huge page sizes are
1505+
architecture dependent. See also
1506+
Documentation/admin-guide/mm/hugetlbpage.rst.
1507+
Format: size[KMG]
14941508

14951509
hung_task_panic=
14961510
[KNL] Should the hung task detector generate panics.

Documentation/admin-guide/mm/hugetlbpage.rst

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,41 @@ with a huge page size selection parameter "hugepagesz=<size>". <size> must
100100
be specified in bytes with optional scale suffix [kKmMgG]. The default huge
101101
page size may be selected with the "default_hugepagesz=<size>" boot parameter.
102102

103+
Hugetlb boot command line parameter semantics
104+
hugepagesz - Specify a huge page size. Used in conjunction with hugepages
105+
parameter to preallocate a number of huge pages of the specified
106+
size. Hence, hugepagesz and hugepages are typically specified in
107+
pairs such as:
108+
hugepagesz=2M hugepages=512
109+
hugepagesz can only be specified once on the command line for a
110+
specific huge page size. Valid huge page sizes are architecture
111+
dependent.
112+
hugepages - Specify the number of huge pages to preallocate. This typically
113+
follows a valid hugepagesz or default_hugepagesz parameter. However,
114+
if hugepages is the first or only hugetlb command line parameter it
115+
implicitly specifies the number of huge pages of default size to
116+
allocate. If the number of huge pages of default size is implicitly
117+
specified, it can not be overwritten by a hugepagesz,hugepages
118+
parameter pair for the default size.
119+
For example, on an architecture with 2M default huge page size:
120+
hugepages=256 hugepagesz=2M hugepages=512
121+
will result in 256 2M huge pages being allocated and a warning message
122+
indicating that the hugepages=512 parameter is ignored. If a hugepages
123+
parameter is preceded by an invalid hugepagesz parameter, it will
124+
be ignored.
125+
default_hugepagesz - Specify the default huge page size. This parameter can
126+
only be specified once on the command line. default_hugepagesz can
127+
optionally be followed by the hugepages parameter to preallocate a
128+
specific number of huge pages of default size. The number of default
129+
sized huge pages to preallocate can also be implicitly specified as
130+
mentioned in the hugepages section above. Therefore, on an
131+
architecture with 2M default huge page size:
132+
hugepages=256
133+
default_hugepagesz=2M hugepages=256
134+
hugepages=256 default_hugepagesz=2M
135+
will all result in 256 2M huge pages being allocated. Valid default
136+
huge page size is architecture dependent.
137+
103138
When multiple huge page sizes are supported, ``/proc/sys/vm/nr_hugepages``
104139
indicates the current number of pre-allocated huge pages of the default size.
105140
Thus, one can use the following command to dynamically allocate/deallocate

Documentation/admin-guide/mm/transhuge.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,13 @@ memory. A lower value can prevent THPs from being
220220
collapsed, resulting fewer pages being collapsed into
221221
THPs, and lower memory access performance.
222222

223+
``max_ptes_shared`` specifies how many pages can be shared across multiple
224+
processes. Exceeding the number would block the collapse::
225+
226+
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
227+
228+
A higher value may increase memory footprint for some workloads.
229+
223230
Boot parameter
224231
==============
225232

Documentation/admin-guide/sysctl/vm.rst

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -831,14 +831,27 @@ tooling to work, you can do::
831831
swappiness
832832
==========
833833

834-
This control is used to define how aggressive the kernel will swap
835-
memory pages. Higher values will increase aggressiveness, lower values
836-
decrease the amount of swap. A value of 0 instructs the kernel not to
837-
initiate swap until the amount of free and file-backed pages is less
838-
than the high water mark in a zone.
834+
This control is used to define the rough relative IO cost of swapping
835+
and filesystem paging, as a value between 0 and 200. At 100, the VM
836+
assumes equal IO cost and will thus apply memory pressure to the page
837+
cache and swap-backed pages equally; lower values signify more
838+
expensive swap IO, higher values indicates cheaper.
839+
840+
Keep in mind that filesystem IO patterns under memory pressure tend to
841+
be more efficient than swap's random IO. An optimal value will require
842+
experimentation and will also be workload-dependent.
839843

840844
The default value is 60.
841845

846+
For in-memory swap, like zram or zswap, as well as hybrid setups that
847+
have swap on faster devices than the filesystem, values beyond 100 can
848+
be considered. For example, if the random IO against the swap device
849+
is on average 2x faster than IO from the filesystem, swappiness should
850+
be 133 (x + 2x = 200, 2x = 133.33).
851+
852+
At 0, the kernel will not initiate swap until the amount of free and
853+
file-backed pages is less than the high watermark in a zone.
854+
842855

843856
unprivileged_userfaultfd
844857
========================

Documentation/core-api/padata.rst

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,26 @@
44
The padata parallel execution mechanism
55
=======================================
66

7-
:Date: December 2019
7+
:Date: May 2020
88

99
Padata is a mechanism by which the kernel can farm jobs out to be done in
10-
parallel on multiple CPUs while retaining their ordering. It was developed for
11-
use with the IPsec code, which needs to be able to perform encryption and
12-
decryption on large numbers of packets without reordering those packets. The
13-
crypto developers made a point of writing padata in a sufficiently general
14-
fashion that it could be put to other uses as well.
10+
parallel on multiple CPUs while optionally retaining their ordering.
1511

16-
Usage
17-
=====
12+
It was originally developed for IPsec, which needs to perform encryption and
13+
decryption on large numbers of packets without reordering those packets. This
14+
is currently the sole consumer of padata's serialized job support.
15+
16+
Padata also supports multithreaded jobs, splitting up the job evenly while load
17+
balancing and coordinating between threads.
18+
19+
Running Serialized Jobs
20+
=======================
1821

1922
Initializing
2023
------------
2124

22-
The first step in using padata is to set up a padata_instance structure for
23-
overall control of how jobs are to be run::
25+
The first step in using padata to run serialized jobs is to set up a
26+
padata_instance structure for overall control of how jobs are to be run::
2427

2528
#include <linux/padata.h>
2629

@@ -162,6 +165,24 @@ functions that correspond to the allocation in reverse::
162165
It is the user's responsibility to ensure all outstanding jobs are complete
163166
before any of the above are called.
164167

168+
Running Multithreaded Jobs
169+
==========================
170+
171+
A multithreaded job has a main thread and zero or more helper threads, with the
172+
main thread participating in the job and then waiting until all helpers have
173+
finished. padata splits the job into units called chunks, where a chunk is a
174+
piece of the job that one thread completes in one call to the thread function.
175+
176+
A user has to do three things to run a multithreaded job. First, describe the
177+
job by defining a padata_mt_job structure, which is explained in the Interface
178+
section. This includes a pointer to the thread function, which padata will
179+
call each time it assigns a job chunk to a thread. Then, define the thread
180+
function, which accepts three arguments, ``start``, ``end``, and ``arg``, where
181+
the first two delimit the range that the thread operates on and the last is a
182+
pointer to the job's shared state, if any. Prepare the shared state, which is
183+
typically allocated on the main thread's stack. Last, call
184+
padata_do_multithreaded(), which will return once the job is finished.
185+
165186
Interface
166187
=========
167188

Documentation/features/vm/numa-memblock/arch-support.txt

Lines changed: 0 additions & 34 deletions
This file was deleted.

Documentation/vm/memory-model.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,10 @@ maps the entire physical memory. For most architectures, the holes
4646
have entries in the `mem_map` array. The `struct page` objects
4747
corresponding to the holes are never fully initialized.
4848

49-
To allocate the `mem_map` array, architecture specific setup code
50-
should call :c:func:`free_area_init_node` function or its convenience
51-
wrapper :c:func:`free_area_init`. Yet, the mappings array is not
52-
usable until the call to :c:func:`memblock_free_all` that hands all
53-
the memory to the page allocator.
49+
To allocate the `mem_map` array, architecture specific setup code should
50+
call :c:func:`free_area_init` function. Yet, the mappings array is not
51+
usable until the call to :c:func:`memblock_free_all` that hands all the
52+
memory to the page allocator.
5453

5554
If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option,
5655
it may free parts of the `mem_map` array that do not cover the

Documentation/vm/page_owner.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,8 +83,7 @@ Usage
8383
4) Analyze information from page owner::
8484

8585
cat /sys/kernel/debug/page_owner > page_owner_full.txt
86-
grep -v ^PFN page_owner_full.txt > page_owner.txt
87-
./page_owner_sort page_owner.txt sorted_page_owner.txt
86+
./page_owner_sort page_owner_full.txt sorted_page_owner.txt
8887

8988
See the result about who allocated each page
9089
in the ``sorted_page_owner.txt``.

arch/alpha/mm/init.c

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -243,21 +243,17 @@ callback_init(void * kernel_end)
243243
*/
244244
void __init paging_init(void)
245245
{
246-
unsigned long zones_size[MAX_NR_ZONES] = {0, };
247-
unsigned long dma_pfn, high_pfn;
246+
unsigned long max_zone_pfn[MAX_NR_ZONES] = {0, };
247+
unsigned long dma_pfn;
248248

249249
dma_pfn = virt_to_phys((char *)MAX_DMA_ADDRESS) >> PAGE_SHIFT;
250-
high_pfn = max_pfn = max_low_pfn;
250+
max_pfn = max_low_pfn;
251251

252-
if (dma_pfn >= high_pfn)
253-
zones_size[ZONE_DMA] = high_pfn;
254-
else {
255-
zones_size[ZONE_DMA] = dma_pfn;
256-
zones_size[ZONE_NORMAL] = high_pfn - dma_pfn;
257-
}
252+
max_zone_pfn[ZONE_DMA] = dma_pfn;
253+
max_zone_pfn[ZONE_NORMAL] = max_pfn;
258254

259255
/* Initialize mem_map[]. */
260-
free_area_init(zones_size);
256+
free_area_init(max_zone_pfn);
261257

262258
/* Initialize the kernel's ZERO_PGE. */
263259
memset((void *)ZERO_PGE, 0, PAGE_SIZE);

0 commit comments

Comments
 (0)