Skip to content

Commit f64e67e

Browse files
lorenzo-stoakesakpm00
authored andcommitted
fork: do not invoke uffd on fork if error occurs
Patch series "fork: do not expose incomplete mm on fork". During fork we may place the virtual memory address space into an inconsistent state before the fork operation is complete. In addition, we may encounter an error during the fork operation that indicates that the virtual memory address space is invalidated. As a result, we should not be exposing it in any way to external machinery that might interact with the mm or VMAs, machinery that is not designed to deal with incomplete state. We specifically update the fork logic to defer khugepaged and ksm to the end of the operation and only to be invoked if no error arose, and disallow uffd from observing fork events should an error have occurred. This patch (of 2): Currently on fork we expose the virtual address space of a process to userland unconditionally if uffd is registered in VMAs, regardless of whether an error arose in the fork. This is performed in dup_userfaultfd_complete() which is invoked unconditionally, and performs two duties - invoking registered handlers for the UFFD_EVENT_FORK event via dup_fctx(), and clearing down userfaultfd_fork_ctx objects established in dup_userfaultfd(). This is problematic, because the virtual address space may not yet be correctly initialised if an error arose. The change in commit d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") makes this more pertinent as we may be in a state where entries in the maple tree are not yet consistent. We address this by, on fork error, ensuring that we roll back state that we would otherwise expect to clean up through the event being handled by userland and perform the memory freeing duty otherwise performed by dup_userfaultfd_complete(). We do this by implementing a new function, dup_userfaultfd_fail(), which performs the same loop, only decrementing reference counts. Note that we perform mmgrab() on the parent and child mm's, however userfaultfd_ctx_put() will mmdrop() this once the reference count drops to zero, so we will avoid memory leaks correctly here. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/d3691d58bb58712b6fb3df2be441d175bd3cdf07.1729014377.git.lorenzo.stoakes@oracle.com Fixes: d240629 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()") Signed-off-by: Lorenzo Stoakes <[email protected]> Reported-by: Jann Horn <[email protected]> Reviewed-by: Jann Horn <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Jan Kara <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 7c18d48 commit f64e67e

File tree

3 files changed

+37
-1
lines changed

3 files changed

+37
-1
lines changed

fs/userfaultfd.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,34 @@ void dup_userfaultfd_complete(struct list_head *fcs)
692692
}
693693
}
694694

695+
void dup_userfaultfd_fail(struct list_head *fcs)
696+
{
697+
struct userfaultfd_fork_ctx *fctx, *n;
698+
699+
/*
700+
* An error has occurred on fork, we will tear memory down, but have
701+
* allocated memory for fctx's and raised reference counts for both the
702+
* original and child contexts (and on the mm for each as a result).
703+
*
704+
* These would ordinarily be taken care of by a user handling the event,
705+
* but we are no longer doing so, so manually clean up here.
706+
*
707+
* mm tear down will take care of cleaning up VMA contexts.
708+
*/
709+
list_for_each_entry_safe(fctx, n, fcs, list) {
710+
struct userfaultfd_ctx *octx = fctx->orig;
711+
struct userfaultfd_ctx *ctx = fctx->new;
712+
713+
atomic_dec(&octx->mmap_changing);
714+
VM_BUG_ON(atomic_read(&octx->mmap_changing) < 0);
715+
userfaultfd_ctx_put(octx);
716+
userfaultfd_ctx_put(ctx);
717+
718+
list_del(&fctx->list);
719+
kfree(fctx);
720+
}
721+
}
722+
695723
void mremap_userfaultfd_prep(struct vm_area_struct *vma,
696724
struct vm_userfaultfd_ctx *vm_ctx)
697725
{

include/linux/userfaultfd_k.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,7 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma,
249249

250250
extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *);
251251
extern void dup_userfaultfd_complete(struct list_head *);
252+
void dup_userfaultfd_fail(struct list_head *);
252253

253254
extern void mremap_userfaultfd_prep(struct vm_area_struct *,
254255
struct vm_userfaultfd_ctx *);
@@ -351,6 +352,10 @@ static inline void dup_userfaultfd_complete(struct list_head *l)
351352
{
352353
}
353354

355+
static inline void dup_userfaultfd_fail(struct list_head *l)
356+
{
357+
}
358+
354359
static inline void mremap_userfaultfd_prep(struct vm_area_struct *vma,
355360
struct vm_userfaultfd_ctx *ctx)
356361
{

kernel/fork.c

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -775,7 +775,10 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
775775
mmap_write_unlock(mm);
776776
flush_tlb_mm(oldmm);
777777
mmap_write_unlock(oldmm);
778-
dup_userfaultfd_complete(&uf);
778+
if (!retval)
779+
dup_userfaultfd_complete(&uf);
780+
else
781+
dup_userfaultfd_fail(&uf);
779782
fail_uprobe_end:
780783
uprobe_end_dup_mmap();
781784
return retval;

0 commit comments

Comments
 (0)