@@ -13,6 +13,7 @@ It has subsequently been updated to reflect changes in the kernel
13
13
including:
14
14
15
15
- per-directory parallel name lookup.
16
+ - ``openat2() `` resolution restriction flags.
16
17
17
18
Introduction to pathname lookup
18
19
===============================
@@ -235,6 +236,13 @@ renamed. If ``d_lookup`` finds that a rename happened while it
235
236
unsuccessfully scanned a chain in the hash table, it simply tries
236
237
again.
237
238
239
+ ``rename_lock `` is also used to detect and defend against potential attacks
240
+ against ``LOOKUP_BENEATH `` and ``LOOKUP_IN_ROOT `` when resolving ".." (where
241
+ the parent directory is moved outside the root, bypassing the ``path_equal() ``
242
+ check). If ``rename_lock `` is updated during the lookup and the path encounters
243
+ a "..", a potential attack occurred and ``handle_dots() `` will bail out with
244
+ ``-EAGAIN ``.
245
+
238
246
inode->i_rwsem
239
247
~~~~~~~~~~~~~~
240
248
@@ -348,6 +356,13 @@ any changes to any mount points while stepping up. This locking is
348
356
needed to stabilize the link to the mounted-on dentry, which the
349
357
refcount on the mount itself doesn't ensure.
350
358
359
+ ``mount_lock `` is also used to detect and defend against potential attacks
360
+ against ``LOOKUP_BENEATH `` and ``LOOKUP_IN_ROOT `` when resolving ".." (where
361
+ the parent directory is moved outside the root, bypassing the ``path_equal() ``
362
+ check). If ``mount_lock `` is updated during the lookup and the path encounters
363
+ a "..", a potential attack occurred and ``handle_dots() `` will bail out with
364
+ ``-EAGAIN ``.
365
+
351
366
RCU
352
367
~~~
353
368
@@ -405,6 +420,10 @@ is requested. Keeping a reference in the ``nameidata`` ensures that
405
420
only one root is in effect for the entire path walk, even if it races
406
421
with a ``chroot() `` system call.
407
422
423
+ It should be noted that in the case of ``LOOKUP_IN_ROOT `` or
424
+ ``LOOKUP_BENEATH ``, the effective root becomes the directory file descriptor
425
+ passed to ``openat2() `` (which exposes these ``LOOKUP_ `` flags).
426
+
408
427
The root is needed when either of two conditions holds: (1) either the
409
428
pathname or a symbolic link starts with a "'/'", or (2) a "``.. ``"
410
429
component is being handled, since "``.. ``" from the root must always stay
@@ -1149,7 +1168,7 @@ so ``NULL`` is returned to indicate that the symlink can be released and
1149
1168
the stack frame discarded.
1150
1169
1151
1170
The other case involves things in ``/proc `` that look like symlinks but
1152
- aren't really::
1171
+ aren't really (and are therefore commonly referred to as "magic-links") ::
1153
1172
1154
1173
$ ls -l /proc/self/fd/1
1155
1174
lrwx------ 1 neilb neilb 64 Jun 13 10:19 /proc/self/fd/1 -> /dev/pts/4
@@ -1286,7 +1305,9 @@ A few flags
1286
1305
A suitable way to wrap up this tour of pathname walking is to list
1287
1306
the various flags that can be stored in the ``nameidata `` to guide the
1288
1307
lookup process. Many of these are only meaningful on the final
1289
- component, others reflect the current state of the pathname lookup.
1308
+ component, others reflect the current state of the pathname lookup, and some
1309
+ apply restrictions to all path components encountered in the path lookup.
1310
+
1290
1311
And then there is ``LOOKUP_EMPTY ``, which doesn't fit conceptually with
1291
1312
the others. If this is not set, an empty pathname causes an error
1292
1313
very early on. If it is set, empty pathnames are not considered to be
@@ -1310,13 +1331,48 @@ longer needed.
1310
1331
``LOOKUP_JUMPED `` means that the current dentry was chosen not because
1311
1332
it had the right name but for some other reason. This happens when
1312
1333
following "``.. ``", following a symlink to ``/ ``, crossing a mount point
1313
- or accessing a "``/proc/$PID/fd/$FD ``" symlink. In this case the
1314
- filesystem has not been asked to revalidate the name (with
1315
- ``d_revalidate() ``). In such cases the inode may still need to be
1316
- revalidated, so ``d_op->d_weak_revalidate() `` is called if
1334
+ or accessing a "``/proc/$PID/fd/$FD ``" symlink (also known as a "magic
1335
+ link"). In this case the filesystem has not been asked to revalidate the
1336
+ name (with ``d_revalidate() ``). In such cases the inode may still need
1337
+ to be revalidated, so ``d_op->d_weak_revalidate() `` is called if
1317
1338
``LOOKUP_JUMPED `` is set when the look completes - which may be at the
1318
1339
final component or, when creating, unlinking, or renaming, at the penultimate component.
1319
1340
1341
+ Resolution-restriction flags
1342
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1343
+
1344
+ In order to allow userspace to protect itself against certain race conditions
1345
+ and attack scenarios involving changing path components, a series of flags are
1346
+ available which apply restrictions to all path components encountered during
1347
+ path lookup. These flags are exposed through ``openat2() ``'s ``resolve `` field.
1348
+
1349
+ ``LOOKUP_NO_SYMLINKS `` blocks all symlink traversals (including magic-links).
1350
+ This is distinctly different from ``LOOKUP_FOLLOW ``, because the latter only
1351
+ relates to restricting the following of trailing symlinks.
1352
+
1353
+ ``LOOKUP_NO_MAGICLINKS `` blocks all magic-link traversals. Filesystems must
1354
+ ensure that they return errors from ``nd_jump_link() ``, because that is how
1355
+ ``LOOKUP_NO_MAGICLINKS `` and other magic-link restrictions are implemented.
1356
+
1357
+ ``LOOKUP_NO_XDEV `` blocks all ``vfsmount `` traversals (this includes both
1358
+ bind-mounts and ordinary mounts). Note that the ``vfsmount `` which contains the
1359
+ lookup is determined by the first mountpoint the path lookup reaches --
1360
+ absolute paths start with the ``vfsmount `` of ``/ ``, and relative paths start
1361
+ with the ``dfd ``'s ``vfsmount ``. Magic-links are only permitted if the
1362
+ ``vfsmount `` of the path is unchanged.
1363
+
1364
+ ``LOOKUP_BENEATH `` blocks any path components which resolve outside the
1365
+ starting point of the resolution. This is done by blocking ``nd_jump_root() ``
1366
+ as well as blocking ".." if it would jump outside the starting point.
1367
+ ``rename_lock `` and ``mount_lock `` are used to detect attacks against the
1368
+ resolution of "..". Magic-links are also blocked.
1369
+
1370
+ ``LOOKUP_IN_ROOT `` resolves all path components as though the starting point
1371
+ were the filesystem root. ``nd_jump_root() `` brings the resolution back to to
1372
+ the starting point, and ".." at the starting point will act as a no-op. As with
1373
+ ``LOOKUP_BENEATH ``, ``rename_lock `` and ``mount_lock `` are used to detect
1374
+ attacks against ".." resolution. Magic-links are also blocked.
1375
+
1320
1376
Final-component flags
1321
1377
~~~~~~~~~~~~~~~~~~~~~
1322
1378
0 commit comments