Skip to content

Commit b55eef8

Browse files
cypharAl Viro
authored andcommitted
Documentation: path-lookup: include new LOOKUP flags
Now that we have new LOOKUP flags, we should document them in the relevant path-walking documentation. And now that we've settled on a common name for nd_jump_link() style symlinks ("magic links"), use that term where magic-link semantics are described. Signed-off-by: Aleksa Sarai <[email protected]> Signed-off-by: Al Viro <[email protected]>
1 parent b28a10a commit b55eef8

File tree

1 file changed

+62
-6
lines changed

1 file changed

+62
-6
lines changed

Documentation/filesystems/path-lookup.rst

Lines changed: 62 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
1313
including:
1414

1515
- per-directory parallel name lookup.
16+
- ``openat2()`` resolution restriction flags.
1617

1718
Introduction to pathname lookup
1819
===============================
@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
235236
unsuccessfully scanned a chain in the hash table, it simply tries
236237
again.
237238

239+
``rename_lock`` is also used to detect and defend against potential attacks
240+
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
241+
the parent directory is moved outside the root, bypassing the ``path_equal()``
242+
check). If ``rename_lock`` is updated during the lookup and the path encounters
243+
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
244+
``-EAGAIN``.
245+
238246
inode->i_rwsem
239247
~~~~~~~~~~~~~~
240248

@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
348356
needed to stabilize the link to the mounted-on dentry, which the
349357
refcount on the mount itself doesn't ensure.
350358

359+
``mount_lock`` is also used to detect and defend against potential attacks
360+
against ``LOOKUP_BENEATH`` and ``LOOKUP_IN_ROOT`` when resolving ".." (where
361+
the parent directory is moved outside the root, bypassing the ``path_equal()``
362+
check). If ``mount_lock`` is updated during the lookup and the path encounters
363+
a "..", a potential attack occurred and ``handle_dots()`` will bail out with
364+
``-EAGAIN``.
365+
351366
RCU
352367
~~~
353368

@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
405420
only one root is in effect for the entire path walk, even if it races
406421
with a ``chroot()`` system call.
407422

423+
It should be noted that in the case of ``LOOKUP_IN_ROOT`` or
424+
``LOOKUP_BENEATH``, the effective root becomes the directory file descriptor
425+
passed to ``openat2()`` (which exposes these ``LOOKUP_`` flags).
426+
408427
The root is needed when either of two conditions holds: (1) either the
409428
pathname or a symbolic link starts with a "'/'", or (2) a "``..``"
410429
component is being handled, since "``..``" from the root must always stay
@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
11491168
the stack frame discarded.
11501169

11511170
The other case involves things in ``/proc`` that look like symlinks but
1152-
aren't really::
1171+
aren't really (and are therefore commonly referred to as "magic-links")::
11531172

11541173
$ ls -l /proc/self/fd/1
11551174
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
@@ -1286,7 +1305,9 @@ A few flags
12861305
A suitable way to wrap up this tour of pathname walking is to list
12871306
the various flags that can be stored in the ``nameidata`` to guide the
12881307
lookup process. Many of these are only meaningful on the final
1289-
component, others reflect the current state of the pathname lookup.
1308+
component, others reflect the current state of the pathname lookup, and some
1309+
apply restrictions to all path components encountered in the path lookup.
1310+
12901311
And then there is ``LOOKUP_EMPTY``, which doesn't fit conceptually with
12911312
the others. If this is not set, an empty pathname causes an error
12921313
very early on. If it is set, empty pathnames are not considered to be
@@ -1310,13 +1331,48 @@ longer needed.
13101331
``LOOKUP_JUMPED`` means that the current dentry was chosen not because
13111332
it had the right name but for some other reason. This happens when
13121333
following "``..``", following a symlink to ``/``, crossing a mount point
1313-
or accessing a "``/proc/$PID/fd/$FD``" symlink. In this case the
1314-
filesystem has not been asked to revalidate the name (with
1315-
``d_revalidate()``). In such cases the inode may still need to be
1316-
revalidated, so ``d_op->d_weak_revalidate()`` is called if
1334+
or accessing a "``/proc/$PID/fd/$FD``" symlink (also known as a "magic
1335+
link"). In this case the filesystem has not been asked to revalidate the
1336+
name (with ``d_revalidate()``). In such cases the inode may still need
1337+
to be revalidated, so ``d_op->d_weak_revalidate()`` is called if
13171338
``LOOKUP_JUMPED`` is set when the look completes - which may be at the
13181339
final component or, when creating, unlinking, or renaming, at the penultimate component.
13191340

1341+
Resolution-restriction flags
1342+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1343+
1344+
In order to allow userspace to protect itself against certain race conditions
1345+
and attack scenarios involving changing path components, a series of flags are
1346+
available which apply restrictions to all path components encountered during
1347+
path lookup. These flags are exposed through ``openat2()``'s ``resolve`` field.
1348+
1349+
``LOOKUP_NO_SYMLINKS`` blocks all symlink traversals (including magic-links).
1350+
This is distinctly different from ``LOOKUP_FOLLOW``, because the latter only
1351+
relates to restricting the following of trailing symlinks.
1352+
1353+
``LOOKUP_NO_MAGICLINKS`` blocks all magic-link traversals. Filesystems must
1354+
ensure that they return errors from ``nd_jump_link()``, because that is how
1355+
``LOOKUP_NO_MAGICLINKS`` and other magic-link restrictions are implemented.
1356+
1357+
``LOOKUP_NO_XDEV`` blocks all ``vfsmount`` traversals (this includes both
1358+
bind-mounts and ordinary mounts). Note that the ``vfsmount`` which contains the
1359+
lookup is determined by the first mountpoint the path lookup reaches --
1360+
absolute paths start with the ``vfsmount`` of ``/``, and relative paths start
1361+
with the ``dfd``'s ``vfsmount``. Magic-links are only permitted if the
1362+
``vfsmount`` of the path is unchanged.
1363+
1364+
``LOOKUP_BENEATH`` blocks any path components which resolve outside the
1365+
starting point of the resolution. This is done by blocking ``nd_jump_root()``
1366+
as well as blocking ".." if it would jump outside the starting point.
1367+
``rename_lock`` and ``mount_lock`` are used to detect attacks against the
1368+
resolution of "..". Magic-links are also blocked.
1369+
1370+
``LOOKUP_IN_ROOT`` resolves all path components as though the starting point
1371+
were the filesystem root. ``nd_jump_root()`` brings the resolution back to to
1372+
the starting point, and ".." at the starting point will act as a no-op. As with
1373+
``LOOKUP_BENEATH``, ``rename_lock`` and ``mount_lock`` are used to detect
1374+
attacks against ".." resolution. Magic-links are also blocked.
1375+
13201376
Final-component flags
13211377
~~~~~~~~~~~~~~~~~~~~~
13221378

0 commit comments

Comments
 (0)