Skip to content

Commit f56caed

Browse files
committed
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <[email protected]>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
2 parents a33f5c3 + 76fd028 commit f56caed

File tree

211 files changed

+3829
-1608
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

211 files changed

+3829
-1608
lines changed

Documentation/admin-guide/cgroup-v1/hugetlb.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,12 +29,14 @@ Brief summary of control files::
2929
hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
3030
hugetlb.<hugepagesize>.usage_in_bytes # show current usage for "hugepagesize" hugetlb
3131
hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB usage limit
32+
hugetlb.<hugepagesize>.numa_stat # show the numa information of the hugetlb memory charged to this cgroup
3233

3334
For a system supporting three hugepage sizes (64k, 32M and 1G), the control
3435
files include::
3536

3637
hugetlb.1GB.limit_in_bytes
3738
hugetlb.1GB.max_usage_in_bytes
39+
hugetlb.1GB.numa_stat
3840
hugetlb.1GB.usage_in_bytes
3941
hugetlb.1GB.failcnt
4042
hugetlb.1GB.rsvd.limit_in_bytes
@@ -43,6 +45,7 @@ files include::
4345
hugetlb.1GB.rsvd.failcnt
4446
hugetlb.64KB.limit_in_bytes
4547
hugetlb.64KB.max_usage_in_bytes
48+
hugetlb.64KB.numa_stat
4649
hugetlb.64KB.usage_in_bytes
4750
hugetlb.64KB.failcnt
4851
hugetlb.64KB.rsvd.limit_in_bytes
@@ -51,6 +54,7 @@ files include::
5154
hugetlb.64KB.rsvd.failcnt
5255
hugetlb.32MB.limit_in_bytes
5356
hugetlb.32MB.max_usage_in_bytes
57+
hugetlb.32MB.numa_stat
5458
hugetlb.32MB.usage_in_bytes
5559
hugetlb.32MB.failcnt
5660
hugetlb.32MB.rsvd.limit_in_bytes

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1268,6 +1268,9 @@ PAGE_SIZE multiple when read back.
12681268
The number of processes belonging to this cgroup
12691269
killed by any kind of OOM killer.
12701270

1271+
oom_group_kill
1272+
The number of times a group OOM has occurred.
1273+
12711274
memory.events.local
12721275
Similar to memory.events but the fields in the file are local
12731276
to the cgroup i.e. not hierarchical. The file modified event
@@ -1311,6 +1314,9 @@ PAGE_SIZE multiple when read back.
13111314
sock (npn)
13121315
Amount of memory used in network transmission buffers
13131316

1317+
vmalloc (npn)
1318+
Amount of memory used for vmap backed memory.
1319+
13141320
shmem
13151321
Amount of cached filesystem data that is swap-backed,
13161322
such as tmpfs, shm segments, shared anonymous mmap()s
@@ -2260,6 +2266,11 @@ HugeTLB Interface Files
22602266
are local to the cgroup i.e. not hierarchical. The file modified event
22612267
generated on this file reflects only the local events.
22622268

2269+
hugetlb.<hugepagesize>.numa_stat
2270+
Similar to memory.numa_stat, it shows the numa information of the
2271+
hugetlb pages of <hugepagesize> in this cgroup. Only active in
2272+
use hugetlb pages are included. The per-node values are in bytes.
2273+
22632274
Misc
22642275
----
22652276

Documentation/admin-guide/mm/damon/reclaim.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,31 @@ PID of the DAMON thread.
208208
If DAMON_RECLAIM is enabled, this becomes the PID of the worker thread. Else,
209209
-1.
210210

211+
nr_reclaim_tried_regions
212+
------------------------
213+
214+
Number of memory regions that tried to be reclaimed by DAMON_RECLAIM.
215+
216+
bytes_reclaim_tried_regions
217+
---------------------------
218+
219+
Total bytes of memory regions that tried to be reclaimed by DAMON_RECLAIM.
220+
221+
nr_reclaimed_regions
222+
--------------------
223+
224+
Number of memory regions that successfully be reclaimed by DAMON_RECLAIM.
225+
226+
bytes_reclaimed_regions
227+
-----------------------
228+
229+
Total bytes of memory regions that successfully be reclaimed by DAMON_RECLAIM.
230+
231+
nr_quota_exceeds
232+
----------------
233+
234+
Number of times that the time/space quota limits have exceeded.
235+
211236
Example
212237
=======
213238

Documentation/admin-guide/mm/damon/usage.rst

Lines changed: 176 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -7,37 +7,40 @@ Detailed Usages
77
DAMON provides below three interfaces for different users.
88

99
- *DAMON user space tool.*
10-
This is for privileged people such as system administrators who want a
11-
just-working human-friendly interface. Using this, users can use the DAMON’s
12-
major features in a human-friendly way. It may not be highly tuned for
13-
special cases, though. It supports both virtual and physical address spaces
14-
monitoring.
10+
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
11+
system administrators who want a just-working human-friendly interface.
12+
Using this, users can use the DAMON’s major features in a human-friendly way.
13+
It may not be highly tuned for special cases, though. It supports both
14+
virtual and physical address spaces monitoring. For more detail, please
15+
refer to its `usage document
16+
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
1517
- *debugfs interface.*
16-
This is for privileged user space programmers who want more optimized use of
17-
DAMON. Using this, users can use DAMON’s major features by reading
18-
from and writing to special debugfs files. Therefore, you can write and use
19-
your personalized DAMON debugfs wrapper programs that reads/writes the
20-
debugfs files instead of you. The DAMON user space tool is also a reference
21-
implementation of such programs. It supports both virtual and physical
22-
address spaces monitoring.
18+
:ref:`This <debugfs_interface>` is for privileged user space programmers who
19+
want more optimized use of DAMON. Using this, users can use DAMON’s major
20+
features by reading from and writing to special debugfs files. Therefore,
21+
you can write and use your personalized DAMON debugfs wrapper programs that
22+
reads/writes the debugfs files instead of you. The `DAMON user space tool
23+
<https://github.com/awslabs/damo>`_ is one example of such programs. It
24+
supports both virtual and physical address spaces monitoring. Note that this
25+
interface provides only simple :ref:`statistics <damos_stats>` for the
26+
monitoring results. For detailed monitoring results, DAMON provides a
27+
:ref:`tracepoint <tracepoint>`.
2328
- *Kernel Space Programming Interface.*
24-
This is for kernel space programmers. Using this, users can utilize every
25-
feature of DAMON most flexibly and efficiently by writing kernel space
26-
DAMON application programs for you. You can even extend DAMON for various
27-
address spaces.
29+
:doc:`This </vm/damon/api>` is for kernel space programmers. Using this,
30+
users can utilize every feature of DAMON most flexibly and efficiently by
31+
writing kernel space DAMON application programs for you. You can even extend
32+
DAMON for various address spaces. For detail, please refer to the interface
33+
:doc:`document </vm/damon/api>`.
2834

29-
Nevertheless, you could write your own user space tool using the debugfs
30-
interface. A reference implementation is available at
31-
https://github.com/awslabs/damo. If you are a kernel programmer, you could
32-
refer to :doc:`/vm/damon/api` for the kernel space programming interface. For
33-
the reason, this document describes only the debugfs interface
35+
36+
.. _debugfs_interface:
3437

3538
debugfs Interface
3639
=================
3740

38-
DAMON exports five files, ``attrs``, ``target_ids``, ``init_regions``,
39-
``schemes`` and ``monitor_on`` under its debugfs directory,
40-
``<debugfs>/damon/``.
41+
DAMON exports eight files, ``attrs``, ``target_ids``, ``init_regions``,
42+
``schemes``, ``monitor_on``, ``kdamond_pid``, ``mk_contexts`` and
43+
``rm_contexts`` under its debugfs directory, ``<debugfs>/damon/``.
4144

4245

4346
Attributes
@@ -131,24 +134,38 @@ Schemes
131134

132135
For usual DAMON-based data access aware memory management optimizations, users
133136
would simply want the system to apply a memory management action to a memory
134-
region of a specific size having a specific access frequency for a specific
135-
time. DAMON receives such formalized operation schemes from the user and
136-
applies those to the target processes. It also counts the total number and
137-
size of regions that each scheme is applied. This statistics can be used for
138-
online analysis or tuning of the schemes.
137+
region of a specific access pattern. DAMON receives such formalized operation
138+
schemes from the user and applies those to the target processes.
139139

140140
Users can get and set the schemes by reading from and writing to ``schemes``
141141
debugfs file. Reading the file also shows the statistics of each scheme. To
142-
the file, each of the schemes should be represented in each line in below form:
142+
the file, each of the schemes should be represented in each line in below
143+
form::
144+
145+
<target access pattern> <action> <quota> <watermarks>
146+
147+
You can disable schemes by simply writing an empty string to the file.
148+
149+
Target Access Pattern
150+
~~~~~~~~~~~~~~~~~~~~~
151+
152+
The ``<target access pattern>`` is constructed with three ranges in below
153+
form::
154+
155+
min-size max-size min-acc max-acc min-age max-age
143156

144-
min-size max-size min-acc max-acc min-age max-age action
157+
Specifically, bytes for the size of regions (``min-size`` and ``max-size``),
158+
number of monitored accesses per aggregate interval for access frequency
159+
(``min-acc`` and ``max-acc``), number of aggregate intervals for the age of
160+
regions (``min-age`` and ``max-age``) are specified. Note that the ranges are
161+
closed interval.
145162

146-
Note that the ranges are closed interval. Bytes for the size of regions
147-
(``min-size`` and ``max-size``), number of monitored accesses per aggregate
148-
interval for access frequency (``min-acc`` and ``max-acc``), number of
149-
aggregate intervals for the age of regions (``min-age`` and ``max-age``), and a
150-
predefined integer for memory management actions should be used. The supported
151-
numbers and their meanings are as below.
163+
Action
164+
~~~~~~
165+
166+
The ``<action>`` is a predefined integer for memory management actions, which
167+
DAMON will apply to the regions having the target access pattern. The
168+
supported numbers and their meanings are as below.
152169

153170
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``
154171
- 1: Call ``madvise()`` for the region with ``MADV_COLD``
@@ -157,20 +174,82 @@ numbers and their meanings are as below.
157174
- 4: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``
158175
- 5: Do nothing but count the statistics
159176

160-
You can disable schemes by simply writing an empty string to the file. For
161-
example, below commands applies a scheme saying "If a memory region of size in
162-
[4KiB, 8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
163-
interval in [10, 20], page out the region", check the entered scheme again, and
164-
finally remove the scheme. ::
177+
Quota
178+
~~~~~
165179

166-
# cd <debugfs>/damon
167-
# echo "4096 8192 0 5 10 20 2" > schemes
168-
# cat schemes
169-
4096 8192 0 5 10 20 2 0 0
170-
# echo > schemes
180+
Optimal ``target access pattern`` for each ``action`` is workload dependent, so
181+
not easy to find. Worse yet, setting a scheme of some action too aggressive
182+
can cause severe overhead. To avoid such overhead, users can limit time and
183+
size quota for the scheme via the ``<quota>`` in below form::
184+
185+
<ms> <sz> <reset interval> <priority weights>
186+
187+
This makes DAMON to try to use only up to ``<ms>`` milliseconds for applying
188+
the action to memory regions of the ``target access pattern`` within the
189+
``<reset interval>`` milliseconds, and to apply the action to only up to
190+
``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
191+
``<ms>`` and ``<sz>`` zero disables the quota limits.
192+
193+
When the quota limit is expected to be exceeded, DAMON prioritizes found memory
194+
regions of the ``target access pattern`` based on their size, access frequency,
195+
and age. For personalized prioritization, users can set the weights for the
196+
three properties in ``<priority weights>`` in below form::
197+
198+
<size weight> <access frequency weight> <age weight>
199+
200+
Watermarks
201+
~~~~~~~~~~
202+
203+
Some schemes would need to run based on current value of the system's specific
204+
metrics like free memory ratio. For such cases, users can specify watermarks
205+
for the condition.::
206+
207+
<metric> <check interval> <high mark> <middle mark> <low mark>
208+
209+
``<metric>`` is a predefined integer for the metric to be checked. The
210+
supported numbers and their meanings are as below.
211+
212+
- 0: Ignore the watermarks
213+
- 1: System's free memory rate (per thousand)
214+
215+
The value of the metric is checked every ``<check interval>`` microseconds.
216+
217+
If the value is higher than ``<high mark>`` or lower than ``<low mark>``, the
218+
scheme is deactivated. If the value is lower than ``<mid mark>``, the scheme
219+
is activated.
220+
221+
.. _damos_stats:
222+
223+
Statistics
224+
~~~~~~~~~~
225+
226+
It also counts the total number and bytes of regions that each scheme is tried
227+
to be applied, the two numbers for the regions that each scheme is successfully
228+
applied, and the total number of the quota limit exceeds. This statistics can
229+
be used for online analysis or tuning of the schemes.
230+
231+
The statistics can be shown by reading the ``schemes`` file. Reading the file
232+
will show each scheme you entered in each line, and the five numbers for the
233+
statistics will be added at the end of each line.
171234

172-
The last two integers in the 4th line of above example is the total number and
173-
the total size of the regions that the scheme is applied.
235+
Example
236+
~~~~~~~
237+
238+
Below commands applies a scheme saying "If a memory region of size in [4KiB,
239+
8KiB] is showing accesses per aggregate interval in [0, 5] for aggregate
240+
interval in [10, 20], page out the region. For the paging out, use only up to
241+
10ms per second, and also don't page out more than 1GiB per second. Under the
242+
limitation, page out memory regions having longer age first. Also, check the
243+
free memory rate of the system every 5 seconds, start the monitoring and paging
244+
out when the free memory rate becomes lower than 50%, but stop it if the free
245+
memory rate becomes larger than 60%, or lower than 30%".::
246+
247+
# cd <debugfs>/damon
248+
# scheme="4096 8192 0 5 10 20 2" # target access pattern and action
249+
# scheme+=" 10 $((1024*1024*1024)) 1000" # quotas
250+
# scheme+=" 0 0 100" # prioritization weights
251+
# scheme+=" 1 5000000 600 500 300" # watermarks
252+
# echo "$scheme" > schemes
174253

175254

176255
Turning On/Off
@@ -195,6 +274,54 @@ the monitoring is turned on. If you write to the files while DAMON is running,
195274
an error code such as ``-EBUSY`` will be returned.
196275

197276

277+
Monitoring Thread PID
278+
---------------------
279+
280+
DAMON does requested monitoring with a kernel thread called ``kdamond``. You
281+
can get the pid of the thread by reading the ``kdamond_pid`` file. When the
282+
monitoring is turned off, reading the file returns ``none``. ::
283+
284+
# cd <debugfs>/damon
285+
# cat monitor_on
286+
off
287+
# cat kdamond_pid
288+
none
289+
# echo on > monitor_on
290+
# cat kdamond_pid
291+
18594
292+
293+
294+
Using Multiple Monitoring Threads
295+
---------------------------------
296+
297+
One ``kdamond`` thread is created for each monitoring context. You can create
298+
and remove monitoring contexts for multiple ``kdamond`` required use case using
299+
the ``mk_contexts`` and ``rm_contexts`` files.
300+
301+
Writing the name of the new context to the ``mk_contexts`` file creates a
302+
directory of the name on the DAMON debugfs directory. The directory will have
303+
DAMON debugfs files for the context. ::
304+
305+
# cd <debugfs>/damon
306+
# ls foo
307+
# ls: cannot access 'foo': No such file or directory
308+
# echo foo > mk_contexts
309+
# ls foo
310+
# attrs init_regions kdamond_pid schemes target_ids
311+
312+
If the context is not needed anymore, you can remove it and the corresponding
313+
directory by putting the name of the context to the ``rm_contexts`` file. ::
314+
315+
# echo foo > rm_contexts
316+
# ls foo
317+
# ls: cannot access 'foo': No such file or directory
318+
319+
Note that ``mk_contexts``, ``rm_contexts``, and ``monitor_on`` files are in the
320+
root directory only.
321+
322+
323+
.. _tracepoint:
324+
198325
Tracepoint for Monitoring Results
199326
=================================
200327

Documentation/admin-guide/mm/numa_memory_policy.rst

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,7 @@ follows:
408408
Memory Policy APIs
409409
==================
410410

411-
Linux supports 3 system calls for controlling memory policy. These APIS
411+
Linux supports 4 system calls for controlling memory policy. These APIS
412412
always affect only the calling task, the calling task's address space, or
413413
some shared object mapped into the calling task's address space.
414414

@@ -460,6 +460,20 @@ requested via the 'flags' argument.
460460

461461
See the mbind(2) man page for more details.
462462

463+
Set home node for a Range of Task's Address Spacec::
464+
465+
long sys_set_mempolicy_home_node(unsigned long start, unsigned long len,
466+
unsigned long home_node,
467+
unsigned long flags);
468+
469+
sys_set_mempolicy_home_node set the home node for a VMA policy present in the
470+
task's address range. The system call updates the home node only for the existing
471+
mempolicy range. Other address ranges are ignored. A home node is the NUMA node
472+
closest to which page allocation will come from. Specifying the home node override
473+
the default allocation policy to allocate memory close to the local node for an
474+
executing CPU.
475+
476+
463477
Memory Policy Command Line Interface
464478
====================================
465479

0 commit comments

Comments
 (0)