Skip to content

Commit 1b631a8

Browse files
committed
Merge remote-tracking branch 'upstream/staging' into staging
2 parents caac060 + d165cf2 commit 1b631a8

File tree

310 files changed

+9922
-6401
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

310 files changed

+9922
-6401
lines changed

.clang-format

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -323,7 +323,6 @@ ForEachMacros:
323323
- 'protocol_for_each_card'
324324
- 'protocol_for_each_dev'
325325
- 'queue_for_each_hw_ctx'
326-
- 'radix_tree_for_each_contig'
327326
- 'radix_tree_for_each_slot'
328327
- 'radix_tree_for_each_tagged'
329328
- 'rbtree_postorder_for_each_entry_safe'

.mailmap

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,13 @@ Mark Brown <broonie@sirena.org.uk>
119119
Mark Yao <markyao0591@gmail.com> <mark.yao@rock-chips.com>
120120
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@theobroma-systems.com>
121121
Martin Kepplinger <martink@posteo.de> <martin.kepplinger@ginzinger.com>
122+
Matthew Wilcox <willy@infradead.org> <matthew.r.wilcox@intel.com>
123+
Matthew Wilcox <willy@infradead.org> <matthew@wil.cx>
124+
Matthew Wilcox <willy@infradead.org> <mawilcox@linuxonhyperv.com>
125+
Matthew Wilcox <willy@infradead.org> <mawilcox@microsoft.com>
126+
Matthew Wilcox <willy@infradead.org> <willy@debian.org>
127+
Matthew Wilcox <willy@infradead.org> <willy@linux.intel.com>
128+
Matthew Wilcox <willy@infradead.org> <willy@parisc-linux.org>
122129
Matthieu CASTET <castet.matthieu@free.fr>
123130
Mauro Carvalho Chehab <mchehab@kernel.org> <mchehab@brturbo.com.br>
124131
Mauro Carvalho Chehab <mchehab@kernel.org> <maurochehab@gmail.com>

Documentation/admin-guide/mm/userfaultfd.rst

Lines changed: 81 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -63,50 +63,98 @@ the generic ioctl available.
6363

6464
The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
6565
defines what memory types are supported by the userfaultfd and what
66-
events, except page fault notifications, may be generated.
67-
68-
If the kernel supports registering userfaultfd ranges on hugetlbfs
69-
virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
70-
uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
71-
set if the kernel supports registering userfaultfd ranges on shared
72-
memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
73-
MAP_SHARED, memfd_create, etc).
74-
75-
The userland application that wants to use userfaultfd with hugetlbfs
76-
or shared memory need to set the corresponding flag in
77-
uffdio_api.features to enable those features.
78-
79-
If the userland desires to receive notifications for events other than
80-
page faults, it has to verify that uffdio_api.features has appropriate
81-
UFFD_FEATURE_EVENT_* bits set. These events are described in more
82-
detail below in "Non-cooperative userfaultfd" section.
83-
84-
Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
85-
be invoked (if present in the returned uffdio_api.ioctls bitmask) to
86-
register a memory range in the userfaultfd by setting the
66+
events, except page fault notifications, may be generated:
67+
68+
- The UFFD_FEATURE_EVENT_* flags indicate that various other events
69+
other than page faults are supported. These events are described in more
70+
detail below in the Non-cooperative userfaultfd section.
71+
72+
- UFFD_FEATURE_MISSING_HUGETLBFS and UFFD_FEATURE_MISSING_SHMEM
73+
indicate that the kernel supports UFFDIO_REGISTER_MODE_MISSING
74+
registrations for hugetlbfs and shared memory (covering all shmem APIs,
75+
i.e. tmpfs, IPCSHM, /dev/zero, MAP_SHARED, memfd_create,
76+
etc) virtual memory areas, respectively.
77+
78+
- UFFD_FEATURE_MINOR_HUGETLBFS indicates that the kernel supports
79+
UFFDIO_REGISTER_MODE_MINOR registration for hugetlbfs virtual memory
80+
areas. UFFD_FEATURE_MINOR_SHMEM is the analogous feature indicating
81+
support for shmem virtual memory areas.
82+
83+
The userland application should set the feature flags it intends to use
84+
when invoking the UFFDIO_API ioctl, to request that those features be
85+
enabled if supported.
86+
87+
Once the userfaultfd API has been enabled the UFFDIO_REGISTER
88+
ioctl should be invoked (if present in the returned uffdio_api.ioctls
89+
bitmask) to register a memory range in the userfaultfd by setting the
8790
uffdio_register structure accordingly. The uffdio_register.mode
8891
bitmask will specify to the kernel which kind of faults to track for
89-
the range (UFFDIO_REGISTER_MODE_MISSING would track missing
90-
pages). The UFFDIO_REGISTER ioctl will return the
92+
the range. The UFFDIO_REGISTER ioctl will return the
9193
uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
9294
userfaults on the range registered. Not all ioctls will necessarily be
93-
supported for all memory types depending on the underlying virtual
94-
memory backend (anonymous memory vs tmpfs vs real filebacked
95-
mappings).
95+
supported for all memory types (e.g. anonymous memory vs. shmem vs.
96+
hugetlbfs), or all types of intercepted faults.
9697

9798
Userland can use the uffdio_register.ioctls to manage the virtual
9899
address space in the background (to add or potentially also remove
99100
memory from the userfaultfd registered range). This means a userfault
100101
could be triggering just before userland maps in the background the
101102
user-faulted page.
102103

103-
The primary ioctl to resolve userfaults is UFFDIO_COPY. That
104-
atomically copies a page into the userfault registered range and wakes
105-
up the blocked userfaults (unless uffdio_copy.mode &
106-
UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
107-
UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
108-
half copied page since it'll keep userfaulting until the copy has
109-
finished.
104+
Resolving Userfaults
105+
--------------------
106+
107+
There are three basic ways to resolve userfaults:
108+
109+
- UFFDIO_COPY atomically copies some existing page contents from
110+
userspace.
111+
112+
- UFFDIO_ZEROPAGE atomically zeros the new page.
113+
114+
- UFFDIO_CONTINUE maps an existing, previously-populated page.
115+
116+
These operations are atomic in the sense that they guarantee nothing can
117+
see a half-populated page, since readers will keep userfaulting until the
118+
operation has finished.
119+
120+
By default, these wake up userfaults blocked on the range in question.
121+
They support a UFFDIO_*_MODE_DONTWAKE mode flag, which indicates
122+
that waking will be done separately at some later time.
123+
124+
Which ioctl to choose depends on the kind of page fault, and what we'd
125+
like to do to resolve it:
126+
127+
- For UFFDIO_REGISTER_MODE_MISSING faults, the fault needs to be
128+
resolved by either providing a new page (UFFDIO_COPY), or mapping
129+
the zero page (UFFDIO_ZEROPAGE). By default, the kernel would map
130+
the zero page for a missing fault. With userfaultfd, userspace can
131+
decide what content to provide before the faulting thread continues.
132+
133+
- For UFFDIO_REGISTER_MODE_MINOR faults, there is an existing page (in
134+
the page cache). Userspace has the option of modifying the page's
135+
contents before resolving the fault. Once the contents are correct
136+
(modified or not), userspace asks the kernel to map the page and let the
137+
faulting thread continue with UFFDIO_CONTINUE.
138+
139+
Notes:
140+
141+
- You can tell which kind of fault occurred by examining
142+
pagefault.flags within the uffd_msg, checking for the
143+
UFFD_PAGEFAULT_FLAG_* flags.
144+
145+
- None of the page-delivering ioctls default to the range that you
146+
registered with. You must fill in all fields for the appropriate
147+
ioctl struct including the range.
148+
149+
- You get the address of the access that triggered the missing page
150+
event out of a struct uffd_msg that you read in the thread from the
151+
uffd. You can supply as many pages as you want with these IOCTLs.
152+
Keep in mind that unless you used DONTWAKE then the first of any of
153+
those IOCTLs wakes up the faulting thread.
154+
155+
- Be sure to test for all errors including
156+
(pollfd[0].revents & POLLERR). This can happen, e.g. when ranges
157+
supplied were incorrect.
110158

111159
QEMU/KVM
112160
========

Documentation/arm64/tagged-address-abi.rst

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,24 @@ how the user addresses are used by the kernel:
4545

4646
1. User addresses not accessed by the kernel but used for address space
4747
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
48-
tagged pointers in this context is allowed with the exception of
49-
``brk()``, ``mmap()`` and the ``new_address`` argument to
50-
``mremap()`` as these have the potential to alias with existing
51-
user addresses.
52-
53-
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
54-
incorrectly accept valid tagged pointers for the ``brk()``,
55-
``mmap()`` and ``mremap()`` system calls.
48+
tagged pointers in this context is allowed with these exceptions:
49+
50+
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
51+
``mremap()`` as these have the potential to alias with existing
52+
user addresses.
53+
54+
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
55+
incorrectly accept valid tagged pointers for the ``brk()``,
56+
``mmap()`` and ``mremap()`` system calls.
57+
58+
- The ``range.start``, ``start`` and ``dst`` arguments to the
59+
``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from
60+
``userfaultfd()``, as fault addresses subsequently obtained by reading
61+
the file descriptor will be untagged, which may otherwise confuse
62+
tag-unaware programs.
63+
64+
NOTE: This behaviour changed in v5.14 and so some earlier kernels may
65+
incorrectly accept valid tagged pointers for this system call.
5666

5767
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
5868
relaxation is disabled by default and the application thread needs to

Documentation/core-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Core utilities
2121
local_ops
2222
workqueue
2323
genericirq
24+
xarray
2425
flexible-arrays
2526
librs
2627
genalloc

0 commit comments

Comments
 (0)