@@ -448,15 +448,17 @@ described. If it finds a ``LAST_NORM`` component it first calls
448
448
filesystem to revalidate the result if it is that sort of filesystem.
449
449
If that doesn't get a good result, it calls "``lookup_slow() ``" which
450
450
takes ``i_rwsem ``, rechecks the cache, and then asks the filesystem
451
- to find a definitive answer. Each of these will call
452
- ``follow_managed() `` (as described below) to handle any mount points.
453
-
454
- In the absence of symbolic links, ``walk_component() `` creates a new
455
- ``struct path `` containing a counted reference to the new dentry and a
456
- reference to the new ``vfsmount `` which is only counted if it is
457
- different from the previous ``vfsmount ``. It then calls
458
- ``path_to_nameidata() `` to install the new ``struct path `` in the
459
- ``struct nameidata `` and drop the unneeded references.
451
+ to find a definitive answer.
452
+
453
+ As the last step of walk_component(), step_into() will be called either
454
+ directly from walk_component() or from handle_dots(). It calls
455
+ handle_mounts(), to check and handle mount points, in which a new
456
+ ``struct path `` is created containing a counted reference to the new dentry and
457
+ a reference to the new ``vfsmount `` which is only counted if it is
458
+ different from the previous ``vfsmount ``. Then if there is
459
+ a symbolic link, step_into() calls pick_link() to deal with it,
460
+ otherwise it installs the new ``struct path `` in the ``struct nameidata ``, and
461
+ drops the unneeded references.
460
462
461
463
This "hand-over-hand" sequencing of getting a reference to the new
462
464
dentry before dropping the reference to the previous dentry may
@@ -470,8 +472,8 @@ Handling the final component
470
472
``nd->last_type `` to refer to the final component of the path. It does
471
473
not call ``walk_component() `` that last time. Handling that final
472
474
component remains for the caller to sort out. Those callers are
473
- `` path_lookupat() ``, `` path_parentat() ``, `` path_mountpoint() `` and
474
- `` path_openat() `` each of which handles the differing requirements of
475
+ path_lookupat(), path_parentat() and
476
+ path_openat() each of which handles the differing requirements of
475
477
different system calls.
476
478
477
479
``path_parentat() `` is clearly the simplest - it just wraps a little bit
@@ -486,20 +488,18 @@ perform their operation.
486
488
object is wanted such as by ``stat() `` or ``chmod() ``. It essentially just
487
489
calls ``walk_component() `` on the final component through a call to
488
490
``lookup_last() ``. ``path_lookupat() `` returns just the final dentry.
489
-
490
- ``path_mountpoint() `` handles the special case of unmounting which must
491
- not try to revalidate the mounted filesystem. It effectively
492
- contains, through a call to ``mountpoint_last() ``, an alternate
493
- implementation of ``lookup_slow() `` which skips that step. This is
494
- important when unmounting a filesystem that is inaccessible, such as
491
+ It is worth noting that when flag ``LOOKUP_MOUNTPOINT `` is set,
492
+ path_lookupat() will unset LOOKUP_JUMPED in nameidata so that in the
493
+ subsequent path traversal d_weak_revalidate() won't be called.
494
+ This is important when unmounting a filesystem that is inaccessible, such as
495
495
one provided by a dead NFS server.
496
496
497
497
Finally ``path_openat() `` is used for the ``open() `` system call; it
498
- contains, in support functions starting with "`` do_last() `` ", all the
498
+ contains, in support functions starting with "open_last_lookups() ", all the
499
499
complexity needed to handle the different subtleties of O_CREAT (with
500
500
or without O_EXCL), final "``/ ``" characters, and trailing symbolic
501
501
links. We will revisit this in the final part of this series, which
502
- focuses on those symbolic links. "`` do_last() `` " will sometimes, but
502
+ focuses on those symbolic links. "open_last_lookups() " will sometimes, but
503
503
not always, take ``i_rwsem ``, depending on what it finds.
504
504
505
505
Each of these, or the functions which call them, need to be alert to
@@ -535,8 +535,7 @@ covered in greater detail in autofs.txt in the Linux documentation
535
535
tree, but a few notes specifically related to path lookup are in order
536
536
here.
537
537
538
- The Linux VFS has a concept of "managed" dentries which is reflected
539
- in function names such as "``follow_managed() ``". There are three
538
+ The Linux VFS has a concept of "managed" dentries. There are three
540
539
potentially interesting things about these dentries corresponding
541
540
to three different flags that might be set in ``dentry->d_flags ``:
542
541
@@ -652,10 +651,10 @@ RCU-walk finds it cannot stop gracefully, it simply gives up and
652
651
restarts from the top with REF-walk.
653
652
654
653
This pattern of "try RCU-walk, if that fails try REF-walk" can be
655
- clearly seen in functions like `` filename_lookup() `` ,
656
- `` filename_parentat() ``, `` filename_mountpoint() `` ,
657
- `` do_filp_open() `` , and `` do_file_open_root() `` . These five
658
- correspond roughly to the four ``path_*() `` functions we met earlier,
654
+ clearly seen in functions like filename_lookup(),
655
+ filename_parentat(),
656
+ do_filp_open(), and do_file_open_root(). These four
657
+ correspond roughly to the three ``path_*() `` functions we met earlier,
659
658
each of which calls ``link_path_walk() ``. The ``path_*() `` functions are
660
659
called using different mode flags until a mode is found which works.
661
660
They are first called with ``LOOKUP_RCU `` set to request "RCU-walk". If
@@ -993,8 +992,8 @@ is 4096. There are a number of reasons for this limit; not letting the
993
992
kernel spend too much time on just one path is one of them. With
994
993
symbolic links you can effectively generate much longer paths so some
995
994
sort of limit is needed for the same reason. Linux imposes a limit of
996
- at most 40 symlinks in any one path lookup. It previously imposed a
997
- further limit of eight on the maximum depth of recursion, but that was
995
+ at most 40 (MAXSYMLINKS) symlinks in any one path lookup. It previously imposed
996
+ a further limit of eight on the maximum depth of recursion, but that was
998
997
raised to 40 when a separate stack was implemented, so there is now
999
998
just the one limit.
1000
999
@@ -1061,42 +1060,26 @@ filesystem cannot successfully get a reference in RCU-walk mode, it
1061
1060
must return ``-ECHILD `` and ``unlazy_walk() `` will be called to return to
1062
1061
REF-walk mode in which the filesystem is allowed to sleep.
1063
1062
1064
- The place for all this to happen is the ``i_op->follow_link() `` inode
1065
- method. In the present mainline code this is never actually called in
1066
- RCU-walk mode as the rewrite is not quite complete. It is likely that
1067
- in a future release this method will be passed an ``inode `` pointer when
1068
- called in RCU-walk mode so it both (1) knows to be careful, and (2) has the
1069
- validated pointer. Much like the ``i_op->permission() `` method we
1070
- looked at previously, ``->follow_link() `` would need to be careful that
1063
+ The place for all this to happen is the ``i_op->get_link() `` inode
1064
+ method. This is called both in RCU-walk and REF-walk. In RCU-walk the
1065
+ ``dentry* `` argument is NULL, ``->get_link() `` can return -ECHILD to drop out of
1066
+ RCU-walk. Much like the ``i_op->permission() `` method we
1067
+ looked at previously, ``->get_link() `` would need to be careful that
1071
1068
all the data structures it references are safe to be accessed while
1072
- holding no counted reference, only the RCU lock. Though getting a
1073
- reference with ``->follow_link() `` is not yet done in RCU-walk mode, the
1074
- code is ready to release the reference when that does happen.
1075
-
1076
- This need to drop the reference to a symlink adds significant
1077
- complexity. It requires a reference to the inode so that the
1078
- ``i_op->put_link() `` inode operation can be called. In REF-walk, that
1079
- reference is kept implicitly through a reference to the dentry, so
1080
- keeping the ``struct path `` of the symlink is easiest. For RCU-walk,
1081
- the pointer to the inode is kept separately. To allow switching from
1082
- RCU-walk back to REF-walk in the middle of processing nested symlinks
1083
- we also need the seq number for the dentry so we can confirm that
1084
- switching back was safe.
1085
-
1086
- Finally, when providing a reference to a symlink, the filesystem also
1087
- provides an opaque "cookie" that must be passed to ``->put_link() `` so that it
1088
- knows what to free. This might be the allocated memory area, or a
1089
- pointer to the ``struct page `` in the page cache, or something else
1090
- completely. Only the filesystem knows what it is.
1069
+ holding no counted reference, only the RCU lock. A callback
1070
+ ``struct delayed_called `` will be passed to ``->get_link() ``:
1071
+ file systems can set their own put_link function and argument through
1072
+ set_delayed_call(). Later on, when VFS wants to put link, it will call
1073
+ do_delayed_call() to invoke that callback function with the argument.
1091
1074
1092
1075
In order for the reference to each symlink to be dropped when the walk completes,
1093
1076
whether in RCU-walk or REF-walk, the symlink stack needs to contain,
1094
1077
along with the path remnants:
1095
1078
1096
- - the ``struct path `` to provide a reference to the inode in REF-walk
1097
- - the ``struct inode * `` to provide a reference to the inode in RCU-walk
1079
+ - the ``struct path `` to provide a reference to the previous path
1080
+ - the ``const char * `` to provide a reference to the to previous name
1098
1081
- the ``seq `` to allow the path to be safely switched from RCU-walk to REF-walk
1099
- - the ``cookie `` that tells `` ->put_path() `` what to put .
1082
+ - the ``struct delayed_call `` for later invocation .
1100
1083
1101
1084
This means that each entry in the symlink stack needs to hold five
1102
1085
pointers and an integer instead of just one pointer (the path
@@ -1120,12 +1103,10 @@ doesn't need to notice. Getting this ``name`` variable on and off the
1120
1103
stack is very straightforward; pushing and popping the references is
1121
1104
a little more complex.
1122
1105
1123
- When a symlink is found, ``walk_component() `` returns the value ``1 ``
1124
- (``0 `` is returned for any other sort of success, and a negative number
1125
- is, as usual, an error indicator). This causes ``get_link() `` to be
1126
- called; it then gets the link from the filesystem. Providing that
1127
- operation is successful, the old path ``name `` is placed on the stack,
1128
- and the new value is used as the ``name `` for a while. When the end of
1106
+ When a symlink is found, walk_component() calls pick_link() via step_into()
1107
+ which returns the link from the filesystem.
1108
+ Providing that operation is successful, the old path ``name `` is placed on the
1109
+ stack, and the new value is used as the ``name `` for a while. When the end of
1129
1110
the path is found (i.e. ``*name `` is ``'\0' ``) the old ``name `` is restored
1130
1111
off the stack and path walking continues.
1131
1112
@@ -1142,23 +1123,23 @@ stack in ``walk_component()`` immediately when the symlink is found;
1142
1123
old symlink as it walks that last component. So it is quite
1143
1124
convenient for ``walk_component() `` to release the old symlink and pop
1144
1125
the references just before pushing the reference information for the
1145
- new symlink. It is guided in this by two flags; ``WALK_GET ``, which
1146
- gives it permission to follow a symlink if it finds one, and
1147
- `` WALK_PUT ``, which tells it to release the current symlink after it has been
1148
- followed. `` WALK_PUT `` is tested first, leading to a call to
1149
- `` put_link() ``. `` WALK_GET `` is tested subsequently (by
1150
- `` should_follow_link() ``) leading to a call to `` pick_link () `` which sets
1151
- up the stack frame .
1126
+ new symlink. It is guided in this by three flags: ``WALK_NOFOLLOW `` which
1127
+ forbids it from following a symlink if it finds one, `` WALK_MORE ``
1128
+ which indicates that it is yet too early to release the
1129
+ current symlink, and `` WALK_TRAILING `` which indicates that it is on the final
1130
+ component of the lookup, so we will check userspace flag `` LOOKUP_FOLLOW `` to
1131
+ decide whether follow it when it is a symlink and call `` may_follow_link () `` to
1132
+ check if we have privilege to follow it .
1152
1133
1153
1134
Symlinks with no final component
1154
1135
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1155
1136
1156
1137
A pair of special-case symlinks deserve a little further explanation.
1157
1138
Both result in a new ``struct path `` (with mount and dentry) being set
1158
- up in the ``nameidata ``, and result in `` get_link() `` returning ``NULL ``.
1139
+ up in the ``nameidata ``, and result in pick_link() returning ``NULL ``.
1159
1140
1160
1141
The more obvious case is a symlink to "``/ ``". All symlinks starting
1161
- with "``/ ``" are detected in `` get_link() `` which resets the ``nameidata ``
1142
+ with "``/ ``" are detected in pick_link() which resets the ``nameidata ``
1162
1143
to point to the effective filesystem root. If the symlink only
1163
1144
contains "``/ ``" then there is nothing more to do, no components at all,
1164
1145
so ``NULL `` is returned to indicate that the symlink can be released and
@@ -1175,12 +1156,11 @@ something that looks like a symlink. It is really a reference to the
1175
1156
target file, not just the name of it. When you ``readlink `` these
1176
1157
objects you get a name that might refer to the same file - unless it
1177
1158
has been unlinked or mounted over. When ``walk_component() `` follows
1178
- one of these, the ``->follow_link() `` method in "procfs" doesn't return
1179
- a string name, but instead calls ``nd_jump_link() `` which updates the
1180
- ``nameidata `` in place to point to that target. ``->follow_link() `` then
1181
- returns ``NULL ``. Again there is no final component and ``get_link() ``
1182
- reports this by leaving the ``last_type `` field of ``nameidata `` as
1183
- ``LAST_BIND ``.
1159
+ one of these, the ``->get_link() `` method in "procfs" doesn't return
1160
+ a string name, but instead calls nd_jump_link() which updates the
1161
+ ``nameidata `` in place to point to that target. ``->get_link() `` then
1162
+ returns ``NULL ``. Again there is no final component and pick_link()
1163
+ returns ``NULL ``.
1184
1164
1185
1165
Following the symlink in the final component
1186
1166
--------------------------------------------
@@ -1197,42 +1177,38 @@ potentially need to call ``link_path_walk()`` again and again on
1197
1177
successive symlinks until one is found that doesn't point to another
1198
1178
symlink.
1199
1179
1200
- This case is handled by the relevant caller of ``link_path_walk() ``, such as
1201
- ``path_lookupat() `` using a loop that calls ``link_path_walk() ``, and then
1202
- handles the final component. If the final component is a symlink
1203
- that needs to be followed, then ``trailing_symlink() `` is called to set
1204
- things up properly and the loop repeats, calling ``link_path_walk() ``
1205
- again. This could loop as many as 40 times if the last component of
1206
- each symlink is another symlink.
1207
-
1208
- The various functions that examine the final component and possibly
1209
- report that it is a symlink are ``lookup_last() ``, ``mountpoint_last() ``
1210
- and ``do_last() ``, each of which use the same convention as
1211
- ``walk_component() `` of returning ``1 `` if a symlink was found that needs
1212
- to be followed.
1213
-
1214
- Of these, ``do_last() `` is the most interesting as it is used for
1215
- opening a file. Part of ``do_last() `` runs with ``i_rwsem `` held and this
1216
- part is in a separate function: ``lookup_open() ``.
1217
-
1218
- Explaining ``do_last() `` completely is beyond the scope of this article,
1219
- but a few highlights should help those interested in exploring the
1220
- code.
1221
-
1222
- 1. Rather than just finding the target file, ``do_last() `` needs to open
1180
+ This case is handled by relevant callers of link_path_walk(), such as
1181
+ path_lookupat(), path_openat() using a loop that calls link_path_walk(),
1182
+ and then handles the final component by calling open_last_lookups() or
1183
+ lookup_last(). If it is a symlink that needs to be followed,
1184
+ open_last_lookups() or lookup_last() will set things up properly and
1185
+ return the path so that the loop repeats, calling
1186
+ link_path_walk() again. This could loop as many as 40 times if the last
1187
+ component of each symlink is another symlink.
1188
+
1189
+ Of the various functions that examine the final component,
1190
+ open_last_lookups() is the most interesting as it works in tandem
1191
+ with do_open() for opening a file. Part of open_last_lookups() runs
1192
+ with ``i_rwsem `` held and this part is in a separate function: lookup_open().
1193
+
1194
+ Explaining open_last_lookups() and do_open() completely is beyond the scope
1195
+ of this article, but a few highlights should help those interested in exploring
1196
+ the code.
1197
+
1198
+ 1. Rather than just finding the target file, do_open() is used after
1199
+ open_last_lookup() to open
1223
1200
it. If the file was found in the dcache, then ``vfs_open() `` is used for
1224
1201
this. If not, then ``lookup_open() `` will either call ``atomic_open() `` (if
1225
1202
the filesystem provides it) to combine the final lookup with the open, or
1226
- will perform the separate ``lookup_real () `` and ``vfs_create () `` steps
1203
+ will perform the separate ``i_op->lookup () `` and ``i_op->create () `` steps
1227
1204
directly. In the later case the actual "open" of this newly found or
1228
- created file will be performed by `` vfs_open() `` , just as if the name
1205
+ created file will be performed by vfs_open(), just as if the name
1229
1206
were found in the dcache.
1230
1207
1231
- 2. ``vfs_open() `` can fail with ``-EOPENSTALE `` if the cached information
1232
- wasn't quite current enough. Rather than restarting the lookup from
1233
- the top with ``LOOKUP_REVAL `` set, ``lookup_open() `` is called instead,
1234
- giving the filesystem a chance to resolve small inconsistencies.
1235
- If that doesn't work, only then is the lookup restarted from the top.
1208
+ 2. vfs_open() can fail with ``-EOPENSTALE `` if the cached information
1209
+ wasn't quite current enough. If it's in RCU-walk ``-ECHILD `` will be returned
1210
+ otherwise ``-ESTALE `` is returned. When ``-ESTALE `` is returned, the caller may
1211
+ retry with ``LOOKUP_REVAL `` flag set.
1236
1212
1237
1213
3. An open with O_CREAT **does ** follow a symlink in the final component,
1238
1214
unlike other creation system calls (like ``mkdir ``). So the sequence::
@@ -1242,8 +1218,8 @@ code.
1242
1218
1243
1219
will create a file called ``/tmp/bar ``. This is not permitted if
1244
1220
``O_EXCL `` is set but otherwise is handled for an O_CREAT open much
1245
- like for a non-creating open: `` should_follow_link() `` returns `` 1 ``, and
1246
- so does `` do_last() `` so that `` trailing_symlink() `` gets called and the
1221
+ like for a non-creating open: lookup_last() or open_last_lookup()
1222
+ returns a non `` NULL `` value, and link_path_walk() gets called and the
1247
1223
open process continues on the symlink that was found.
1248
1224
1249
1225
Updating the access time
0 commit comments