@@ -63,50 +63,98 @@ the generic ioctl available.
6363
6464The uffdio_api.features bitmask returned by the UFFDIO_API ioctl
6565defines what memory types are supported by the userfaultfd and what
66- events, except page fault notifications, may be generated.
67-
68- If the kernel supports registering userfaultfd ranges on hugetlbfs
69- virtual memory areas, UFFD_FEATURE_MISSING_HUGETLBFS will be set in
70- uffdio_api.features. Similarly, UFFD_FEATURE_MISSING_SHMEM will be
71- set if the kernel supports registering userfaultfd ranges on shared
72- memory (covering all shmem APIs, i.e. tmpfs, IPCSHM, /dev/zero
73- MAP_SHARED, memfd_create, etc).
74-
75- The userland application that wants to use userfaultfd with hugetlbfs
76- or shared memory need to set the corresponding flag in
77- uffdio_api.features to enable those features.
78-
79- If the userland desires to receive notifications for events other than
80- page faults, it has to verify that uffdio_api.features has appropriate
81- UFFD_FEATURE_EVENT_* bits set. These events are described in more
82- detail below in "Non-cooperative userfaultfd" section.
83-
84- Once the userfaultfd has been enabled the UFFDIO_REGISTER ioctl should
85- be invoked (if present in the returned uffdio_api.ioctls bitmask) to
86- register a memory range in the userfaultfd by setting the
66+ events, except page fault notifications, may be generated:
67+
68+ - The UFFD_FEATURE_EVENT_* flags indicate that various other events
69+ other than page faults are supported. These events are described in more
70+ detail below in the Non-cooperative userfaultfd section.
71+
72+ - UFFD_FEATURE_MISSING_HUGETLBFS and UFFD_FEATURE_MISSING_SHMEM
73+ indicate that the kernel supports UFFDIO_REGISTER_MODE_MISSING
74+ registrations for hugetlbfs and shared memory (covering all shmem APIs,
75+ i.e. tmpfs, IPCSHM, /dev/zero, MAP_SHARED, memfd_create,
76+ etc) virtual memory areas, respectively.
77+
78+ - UFFD_FEATURE_MINOR_HUGETLBFS indicates that the kernel supports
79+ UFFDIO_REGISTER_MODE_MINOR registration for hugetlbfs virtual memory
80+ areas. UFFD_FEATURE_MINOR_SHMEM is the analogous feature indicating
81+ support for shmem virtual memory areas.
82+
83+ The userland application should set the feature flags it intends to use
84+ when invoking the UFFDIO_API ioctl, to request that those features be
85+ enabled if supported.
86+
87+ Once the userfaultfd API has been enabled the UFFDIO_REGISTER
88+ ioctl should be invoked (if present in the returned uffdio_api.ioctls
89+ bitmask) to register a memory range in the userfaultfd by setting the
8790uffdio_register structure accordingly. The uffdio_register.mode
8891bitmask will specify to the kernel which kind of faults to track for
89- the range (UFFDIO_REGISTER_MODE_MISSING would track missing
90- pages). The UFFDIO_REGISTER ioctl will return the
92+ the range. The UFFDIO_REGISTER ioctl will return the
9193uffdio_register.ioctls bitmask of ioctls that are suitable to resolve
9294userfaults on the range registered. Not all ioctls will necessarily be
93- supported for all memory types depending on the underlying virtual
94- memory backend (anonymous memory vs tmpfs vs real filebacked
95- mappings).
95+ supported for all memory types (e.g. anonymous memory vs. shmem vs.
96+ hugetlbfs), or all types of intercepted faults.
9697
9798Userland can use the uffdio_register.ioctls to manage the virtual
9899address space in the background (to add or potentially also remove
99100memory from the userfaultfd registered range). This means a userfault
100101could be triggering just before userland maps in the background the
101102user-faulted page.
102103
103- The primary ioctl to resolve userfaults is UFFDIO_COPY. That
104- atomically copies a page into the userfault registered range and wakes
105- up the blocked userfaults (unless uffdio_copy.mode &
106- UFFDIO_COPY_MODE_DONTWAKE is set). Other ioctl works similarly to
107- UFFDIO_COPY. They're atomic as in guaranteeing that nothing can see an
108- half copied page since it'll keep userfaulting until the copy has
109- finished.
104+ Resolving Userfaults
105+ --------------------
106+
107+ There are three basic ways to resolve userfaults:
108+
109+ - UFFDIO_COPY atomically copies some existing page contents from
110+ userspace.
111+
112+ - UFFDIO_ZEROPAGE atomically zeros the new page.
113+
114+ - UFFDIO_CONTINUE maps an existing, previously-populated page.
115+
116+ These operations are atomic in the sense that they guarantee nothing can
117+ see a half-populated page, since readers will keep userfaulting until the
118+ operation has finished.
119+
120+ By default, these wake up userfaults blocked on the range in question.
121+ They support a UFFDIO_*_MODE_DONTWAKE mode flag, which indicates
122+ that waking will be done separately at some later time.
123+
124+ Which ioctl to choose depends on the kind of page fault, and what we'd
125+ like to do to resolve it:
126+
127+ - For UFFDIO_REGISTER_MODE_MISSING faults, the fault needs to be
128+ resolved by either providing a new page (UFFDIO_COPY), or mapping
129+ the zero page (UFFDIO_ZEROPAGE). By default, the kernel would map
130+ the zero page for a missing fault. With userfaultfd, userspace can
131+ decide what content to provide before the faulting thread continues.
132+
133+ - For UFFDIO_REGISTER_MODE_MINOR faults, there is an existing page (in
134+ the page cache). Userspace has the option of modifying the page's
135+ contents before resolving the fault. Once the contents are correct
136+ (modified or not), userspace asks the kernel to map the page and let the
137+ faulting thread continue with UFFDIO_CONTINUE.
138+
139+ Notes:
140+
141+ - You can tell which kind of fault occurred by examining
142+ pagefault.flags within the uffd_msg, checking for the
143+ UFFD_PAGEFAULT_FLAG_* flags.
144+
145+ - None of the page-delivering ioctls default to the range that you
146+ registered with. You must fill in all fields for the appropriate
147+ ioctl struct including the range.
148+
149+ - You get the address of the access that triggered the missing page
150+ event out of a struct uffd_msg that you read in the thread from the
151+ uffd. You can supply as many pages as you want with these IOCTLs.
152+ Keep in mind that unless you used DONTWAKE then the first of any of
153+ those IOCTLs wakes up the faulting thread.
154+
155+ - Be sure to test for all errors including
156+ (pollfd[0].revents & POLLERR). This can happen, e.g. when ranges
157+ supplied were incorrect.
110158
111159QEMU/KVM
112160========
0 commit comments