Skip to content

Commit ab87f9a

Browse files
cypharAl Viro
authored andcommitted
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
Allow LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution (in the case of LOOKUP_BENEATH the resolution will still fail if ".." resolution would resolve a path outside of the root -- while LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are still disallowed entirely[*]. As Jann explains[1,2], the need for this patch (and the original no-".." restriction) is explained by observing there is a fairly easy-to-exploit race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be used to "skip over" nd->root and thus escape to the filesystem above nd->root. thread1 [attacker]: for (;;) renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE); thread2 [victim]: for (;;) openat2(dirb, "b/c/../../etc/shadow", { .flags = O_PATH, .resolve = RESOLVE_IN_ROOT } ); With fairly significant regularity, thread2 will resolve to "/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar (though somewhat more privileged) attack using MS_MOVE. With this patch, such cases will be detected *during* ".." resolution and will return -EAGAIN for userspace to decide to either retry or abort the lookup. It should be noted that ".." is the weak point of chroot(2) -- walking *into* a subdirectory tautologically cannot result in you walking *outside* nd->root (except through a bind-mount or magic-link). There is also no other way for a directory's parent to change (which is the primary worry with ".." resolution here) other than a rename or MS_MOVE. The primary reason for deferring to userspace with -EAGAIN is that an in-kernel retry loop (or doing a path_is_under() check after re-taking the relevant seqlocks) can become unreasonably expensive on machines with lots of VFS activity (nfsd can cause lots of rename_lock updates). Thus it should be up to userspace how many times they wish to retry the lookup -- the selftests for this attack indicate that there is a ~35% chance of the lookup succeeding on the first try even with an attacker thrashing rename_lock. A variant of the above attack is included in the selftests for openat2(2) later in this patch series. I've run this test on several machines for several days and no instances of a breakout were detected. While this is not concrete proof that this is safe, when combined with the above argument it should lend some trustworthiness to this construction. [*] It may be acceptable in the future to do a path_is_under() check for magic-links after they are resolved. However this seems unlikely to be a feature that people *really* need -- it can be added later if it turns out a lot of people want it. [1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/ [2]: https://lore.kernel.org/lkml/CAG48ez30WJhbsro2HOc_DR7V91M+hNFzBP5ogRMZaxbAORvqzg@mail.gmail.com/ Cc: Christian Brauner <[email protected]> Suggested-by: Jann Horn <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
1 parent 8db52c7 commit ab87f9a

File tree

1 file changed

+27
-16
lines changed

1 file changed

+27
-16
lines changed

fs/namei.c

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -491,7 +491,7 @@ struct nameidata {
491491
struct path root;
492492
struct inode *inode; /* path.dentry.d_inode */
493493
unsigned int flags;
494-
unsigned seq, m_seq;
494+
unsigned seq, m_seq, r_seq;
495495
int last_type;
496496
unsigned depth;
497497
int total_link_count;
@@ -1793,22 +1793,31 @@ static inline int handle_dots(struct nameidata *nd, int type)
17931793
if (type == LAST_DOTDOT) {
17941794
int error = 0;
17951795

1796-
/*
1797-
* Scoped-lookup flags resolving ".." is not currently safe --
1798-
* races can cause our parent to have moved outside of the root
1799-
* and us to skip over it.
1800-
*/
1801-
if (unlikely(nd->flags & LOOKUP_IS_SCOPED))
1802-
return -EXDEV;
18031796
if (!nd->root.mnt) {
18041797
error = set_root(nd);
18051798
if (error)
18061799
return error;
18071800
}
1808-
if (nd->flags & LOOKUP_RCU) {
1809-
return follow_dotdot_rcu(nd);
1810-
} else
1811-
return follow_dotdot(nd);
1801+
if (nd->flags & LOOKUP_RCU)
1802+
error = follow_dotdot_rcu(nd);
1803+
else
1804+
error = follow_dotdot(nd);
1805+
if (error)
1806+
return error;
1807+
1808+
if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
1809+
/*
1810+
* If there was a racing rename or mount along our
1811+
* path, then we can't be sure that ".." hasn't jumped
1812+
* above nd->root (and so userspace should retry or use
1813+
* some fallback).
1814+
*/
1815+
smp_rmb();
1816+
if (unlikely(__read_seqcount_retry(&mount_lock.seqcount, nd->m_seq)))
1817+
return -EAGAIN;
1818+
if (unlikely(__read_seqcount_retry(&rename_lock.seqcount, nd->r_seq)))
1819+
return -EAGAIN;
1820+
}
18121821
}
18131822
return 0;
18141823
}
@@ -2273,6 +2282,11 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
22732282
nd->last_type = LAST_ROOT; /* if there are only slashes... */
22742283
nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
22752284
nd->depth = 0;
2285+
2286+
nd->m_seq = __read_seqcount_begin(&mount_lock.seqcount);
2287+
nd->r_seq = __read_seqcount_begin(&rename_lock.seqcount);
2288+
smp_rmb();
2289+
22762290
if (flags & LOOKUP_ROOT) {
22772291
struct dentry *root = nd->root.dentry;
22782292
struct inode *inode = root->d_inode;
@@ -2281,9 +2295,8 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
22812295
nd->path = nd->root;
22822296
nd->inode = inode;
22832297
if (flags & LOOKUP_RCU) {
2284-
nd->seq = __read_seqcount_begin(&nd->path.dentry->d_seq);
2298+
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
22852299
nd->root_seq = nd->seq;
2286-
nd->m_seq = read_seqbegin(&mount_lock);
22872300
} else {
22882301
path_get(&nd->path);
22892302
}
@@ -2294,8 +2307,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
22942307
nd->path.mnt = NULL;
22952308
nd->path.dentry = NULL;
22962309

2297-
nd->m_seq = read_seqbegin(&mount_lock);
2298-
22992310
/* Absolute pathname -- fetch the root (LOOKUP_IN_ROOT uses nd->dfd). */
23002311
if (*s == '/' && !(flags & LOOKUP_IN_ROOT)) {
23012312
error = nd_jump_root(nd);

0 commit comments

Comments
 (0)