-
Notifications
You must be signed in to change notification settings - Fork 5
tree-in-dcache stuff #6386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tree-in-dcache stuff #6386
Conversation
|
Upstream branch: 4722981 |
|
Upstream branch: 4722981 |
9c00e1b to
178ff96
Compare
86f62c3 to
c8a7e22
Compare
|
Upstream branch: 7dc211c |
178ff96 to
b144003
Compare
c8a7e22 to
c919396
Compare
|
Upstream branch: ec12ab2 |
b144003 to
2c7d1f4
Compare
c919396 to
73c6b0b
Compare
fuse_ctl_remove_conn() used to decrement the link count of root manually; that got subsumed by simple_recursive_removal(), but in case when subdirectory creation has failed the latter won't get called. Just move the modification of parent's link count into fuse_ctl_add_dentry() to keep the things simple. Allows to get rid of the nlink argument as well... Fixes: fcaac5b "fuse_ctl: use simple_recursive_removal()" Acked-by: Miklos Szeredi <[email protected]> Signed-off-by: Al Viro <[email protected]>
If we have LOCKDOWN_TRACEFS, the function bails out - *after* having locked the parent directory and without bothering to undo that. Just check it before tracefs_start_creating()... Fixes: e247094 "tracefs/eventfs: Add missing lockdown checks" Acked-by: Steven Rostedt (Google) <[email protected]> Signed-off-by: Al Viro <[email protected]>
simple_recursive_removal(), but instead of victim dentry it takes parent + name. Used to be open-coded in fs/fuse/control.c, but there's no need to expose the guts of that thing there and there are other potential users, so let's lift it into libfs... Acked-by: Miklos Szeredi <[email protected]> Signed-off-by: Al Viro <[email protected]>
should be paired with simple_start_creating() - unlocks parent and drops dentry reference. Signed-off-by: Al Viro <[email protected]>
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin dentries of created objects in dcache (and undo it when removing those). Reference is grabbed and not released, but it's not actually _stored_ anywhere. That works, but it's hard to follow and verify; among other things, we have no way to tell _which_ of the increments is intended to be an unpaired one. Worse, on removal we need to decide whether the reference had already been dropped, which can be non-trivial if that removal is on umount and we need to figure out if this dentry is pinned due to e.g. unlink() not done. Usually that is handled by using kill_litter_super() as ->kill_sb(), but there are open-coded special cases of the same (consider e.g. /proc/self). Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set claims responsibility for +1 in refcount. The end result this series is aiming for: * get these unbalanced dget() and dput() replaced with new primitives that would, in addition to adjusting refcount, set and clear persistency flag. * instead of having kill_litter_super() mess with removing the remaining "leaked" references (e.g. for all tmpfs files that hadn't been removed prior to umount), have the regular shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries, dropping the corresponding reference if it had been set. After that kill_litter_super() becomes an equivalent of kill_anon_super(). Doing that in a single step is not feasible - it would affect too many places in too many filesystems. It has to be split into a series. Here we * introduce the new flag * teach shrink_dcache_for_umount() to handle it (i.e. remove and drop refcount on anything that survives to umount with that flag still set) * teach kill_litter_super() that anything with that flag does *not* need to be unpinned. Next commits will add primitives for maintaing that flag and convert the common helpers to those. After that - a long series of per-filesystem patches converting to those primitives. Signed-off-by: Al Viro <[email protected]>
* d_make_persistent(dentry, inode) - bump refcount, mark persistent and make hashed positive. Return value is a borrowed reference to dentry; it can be used until something removes persistency (at the very least, until the parent gets unlocked, but some filesystems may have stronger exclusion). * d_make_discardable() - remove persistency mark and drop reference. d_make_persistent() is similar to combination of d_instantiate(), dget() and setting flag. The only difference is that unlike d_instantiate() it accepts hashed and unhashed negatives alike. It is always called in strong locking environment (parent held exclusive, or, in some cases, dentry coming from d_alloc_name()); if we ever start using it with parent held only shared and dentry coming from d_alloc_parallel(), we'll need to copy the in-lookup logics from __d_add(). d_make_discardable() is eqiuvalent to combination of removing flag and dput(); since flag removal requires ->d_lock, there's no point trying to avoid taking that for refcount decrement as fast_dput() does. The slow path of dput() has been taken into a helper and reused in d_make_discardable() instead. Signed-off-by: Al Viro <[email protected]>
Note that simple_unlink() et.al. are used by many filesystems; for now they can not assume that persistency mark will have been set back when the object got created. Once all conversions are done we'll have them complain if called for something that had not been marked persistent. Signed-off-by: Al Viro <[email protected]>
Quite a bit is already done by infrastructure changes (simple_link(),
simple_unlink()) - all that is left is replacing d_instantiate() +
pinning dget() (in ->symlink() and ->mknod()) with d_make_persistent(),
and, in case of shmem, using simple_unlink() and simple_link() in
->unlink() and ->link() resp., instead of open-coding those there.
Since d_make_persistent() accepts (and hashes) unhashed ones, shmem
situation gets simpler - we no longer care whether ->lookup() has hashed
the sucker.
With that done, we don't need kill_litter_super() for these filesystems
anymore - by the umount time all remaining dentries will be marked
persistent and kill_litter_super() will boil down to call of
kill_anon_super().
The same goes for devtmpfs and rootfs - they are handled by
ramfs or by shmem, depending upon config.
NB: strictly speaking, both devtmpfs and rootfs ought to use
ramfs_kill_sb() if they end up using ramfs; that's a separate
story and the only impact of "just use kill_{litter,anon}_super()"
is that we fail to free their sb->s_fs_info... on reboot.
That's orthogonal to the changes in this series - kill_litter_super()
is identical to kill_anon_super() for those at this point.
Signed-off-by: Al Viro <[email protected]>
... and there's no need to remember those pointers anywhere - ->kill_sb() no longer needs to bother since kill_anon_super() will take care of them anyway and proc_pid_readdir() only wants the inumbers, which we had in a couple of static variables all along. Signed-off-by: Al Viro <[email protected]>
These are guaranteed to be empty by the time they are shut down; both are single-instance and there is an internal mount maintained for as long as there is any contents. Both have that internal mount pinned by every object in root. In other words, kill_litter_super() boils down to kill_anon_super() for those. Reviewed-by: Joel Becker <[email protected]> Acked-by: Paul Moore <paul@paul-moore> (LSM) Acked-by: Andreas Hindborg <[email protected]> (configfs) Signed-off-by: Al Viro <[email protected]>
entirely static tree, populated by simple_fill_super(). Can switch to kill_anon_super() without any other changes. Signed-off-by: Al Viro <[email protected]>
Entirely static tree populated by simple_fill_super(). Can use kill_anon_super() as-is. Acked-by: Casey Schaufler <[email protected]> Signed-off-by: Al Viro <[email protected]>
Very much ramfs-like; dget()+d_instantiate() -> d_make_persistent() (in two places) is all it takes. NB: might make sense to turn its ->put_super() into ->kill_sb(). Signed-off-by: Al Viro <[email protected]>
All modifications via normal VFS codepaths; just take care of making persistent in in mqueue_create_attr() and discardable in mqueue_unlink() and it doesn't need kill_litter_super() at all. mqueue_unlink() side is best handled by having it call simple_unlink() rather than duplicating its guts... Signed-off-by: Al Viro <[email protected]>
object creation goes through the normal VFS paths or approximation
thereof (user_path_create()/done_path_create() in case of bpf_obj_do_pin(),
open-coded simple_{start,done}_creating() in bpf_iter_link_pin_kernel()
at mount time), removals go entirely through the normal VFS paths (and
->unlink() is simple_unlink() there).
Enough to have bpf_dentry_finalize() use d_make_persistent() instead
of dget() and we are done.
Convert bpf_iter_link_pin_kernel() to simple_{start,done}_creating(),
while we are at it.
Signed-off-by: Al Viro <[email protected]>
All modifications via normal VFS codepaths; just take care of making persistent in ->create() and ->mkdir() and that's it (removal side doesn't need any changes, since it uses simple_rmdir() for ->rmdir() and calls simple_unlink() from ->unlink()). Signed-off-by: Al Viro <[email protected]>
creation/removal is via normal VFS paths; make ->mkdir() and ->symlink() use d_make_persistent(); ->rmdir() and ->unlink() - d_make_discardable() instead of dput() and that's it. d_make_persistent() works for unhashed just fine... Note that only persistent dentries are ever hashed there; unusual absense of ->d_delete() in dentry_operations is due to that - anything that has refcount reach 0 will be unhashed there, so it won't get to checking ->d_delete anyway. Signed-off-by: Al Viro <[email protected]>
removals are done with locked_recursive_removal(); switch creations to simple_start_creating()/d_make_persistent()/simple_done_creating() and take them to a helper (add_entry()), while we are at it - simpler control flow that way. Signed-off-by: Al Viro <[email protected]>
Don't bother to store the dentry of /policy_capabilities - it belongs to invariant part of tree and we only use it to populate that directory, so there's no reason to keep it around afterwards. Same situation as with /avc, /ss, etc. There are two directories that get replaced on policy load - /class and /booleans. These we need to stash (and update the pointers on policy reload); /policy_capabilities is not in the same boat. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
allocating dentry after the inode has been set up reduces the amount of boilerplate - "attach this inode under that name and this parent or drop inode in case of failure" simplifies quite a few places. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
Tree has invariant part + two subtrees that get replaced upon each policy load. Invariant parts stay for the lifetime of filesystem, these two subdirs - from policy load to policy load (serialized on lock_rename(root, ...)). All object creations are via d_alloc_name()+d_add() inside selinuxfs, all removals are via simple_recursive_removal(). Turn those d_add() into d_make_persistent()+dput() and that's mostly it. Acked-by: Paul Moore <[email protected]> Reviewed-by: Stephen Smalley <[email protected]> Tested-by: Stephen Smalley <[email protected]> Signed-off-by: Al Viro <[email protected]>
ffs_data_closed() has a seriously confusing logics in it: in addition to the normal "decrement a counter and do some work if it hits zero" there's "... and if it has somehow become negative, do that" bit. It's not a race, despite smelling rather fishy. What really happens is that in addition to "call that on close of files there, to match the increments of counter on opens" there's one call in ->kill_sb(). Counter starts at 0 and never goes negative over the lifetime of filesystem (or we have much worse problems everywhere - ->release() call of some file somehow unpaired with successful ->open() of the same). At the filesystem shutdown it will be 0 or, again, we have much worse problems - filesystem instance destroyed with files on it still open. In other words, at that call and at that call alone the decrement would go from 0 to -1, hitting that chunk (and not hitting the "if it hits 0" part). So that check is a weirdly spelled "called from ffs_kill_sb()". Just expand the call in the latter and kill the misplaced chunk in ffs_data_closed(). Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
A reference is held by the superblock (it's dropped in ffs_kill_sb()) and filesystem will not get to ->kill_sb() while there are any opened files, TYVM... Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
... otherwise we just might free ffs with ffs->reset_work still on queue. That needs to be done after ffs_data_reset() - that's the cutoff point for configfs accesses (serialized on gadget_info->lock), which is where the schedule_work() would come from. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
ffs_epfile_open() can race with removal, ending up with file->private_data pointing to freed object. There is a total count of opened files on functionfs (both ep0 and dynamic ones) and when it hits zero, dynamic files get removed. Unfortunately, that removal can happen while another thread is in ffs_epfile_open(), but has not incremented the count yet. In that case open will succeed, leaving us with UAF on any subsequent read() or write(). The root cause is that ffs->opened is misused; atomic_dec_and_test() vs. atomic_add_return() is not a good idea, when object remains visible all along. To untangle that * serialize openers on ffs->mutex (both for ep0 and for dynamic files) * have dynamic ones use atomic_inc_not_zero() and fail if we had zero ->opened; in that case the file we are opening is doomed. * have the inodes of dynamic files marked on removal (from the callback of simple_recursive_removal()) - clear ->i_private there. * have open of dynamic ones verify they hadn't been already removed, along with checking that state is FFS_ACTIVE. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
No need to return dentry from ffs_sb_create_file() or keep it around afterwards. To avoid subtle issues with getting to ffs from epfiles in ffs_epfiles_destroy(), pass the superblock as explicit argument. Callers have it anyway. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Al Viro <[email protected]>
All files are regular; ep0 is there all along, other ep* may appear and go away during the filesystem lifetime; all of those are guaranteed to be gone by the time we umount it. Object creation is in ffs_sb_create_file(), removals - at ->kill_sb() time (for ep0) or by simple_remove_by_name() from ffs_epfiles_destroy() (for the rest of them). Switch ffs_sb_create_file() to simple_start_creating()/d_make_persistent()/ simple_done_creating() and that's it. Signed-off-by: Al Viro <[email protected]>
No need to return dentry from gadgetfs_create_file() or keep it around afterwards. Signed-off-by: Al Viro <[email protected]>
same as functionfs Signed-off-by: Al Viro <[email protected]>
hypfs dentries end up with refcount 2 when they are not busy. Refcount 1 is enough to keep them pinned, and going that way allows to simplify things nicely: * don't need to drop an extra reference before the call of kill_litter_super() in ->kill_sb(); all we need there is to reset the cleanup list - everything on it will be taken out automatically. * we can make use of simple_recursive_removal() on tree rebuilds; just make sure that only children of root end up in the cleanup list and hypfs_delete_tree() becomes much simpler Signed-off-by: Al Viro <[email protected]>
Every single caller only cares about PTR_ERR_OR_ZERO() of return value... Signed-off-by: Al Viro <[email protected]>
same story as for hypfs_create_str() Signed-off-by: Al Viro <[email protected]>
just have hypfs_create_file() do the usual simple_start_creating()/ d_make_persistent()/simple_done_creating() and that's it Signed-off-by: Al Viro <[email protected]>
Just use d_make_persistent() + dput() (and fold the latter into simple_finish_creating()) and that's it... NOTE: pipe->dentry is a borrowed reference - it does not contribute to dentry refcount. Signed-off-by: Al Viro <[email protected]>
One instance per net-ns. There's a fixed subset (several files in root, an optional symlink in root + initially empty /clients/) + per-client subdirectory in /clients/. Clients can appear only after the filesystem is there and they are all gone before it gets through ->kill_sb(). Fixed subset created in fill_super(), regular files by simple_fill_super(), then a subdirectory and a symlink - manually. It is removed by kill_litter_super(). Per-client subdirectories are created by nfsd_client_mkdir() (populated with client-supplied list of files in them). Removed by nfsd_client_rmdir(), which is simple_recursive_removal(). All dentries except for the ones from simple_fill_super() come from * nfsd_mkdir() (subdirectory, dentry from simple_start_creating()). Called from fill_super() (creates initially empty /clients) and from nfsd_client_mkdir (creates a per-client subdirectory in /clients). * _nfsd_symlink() (symlink, dentry from simple_start_creating()), called from fill_super(). * nfsdfs_create_files() (regulars, dentry from simple_start_creating()), called only from nfsd_client_mkdir(). Turn d_instatiate() + inode_unlock() into d_make_persistent() + simple_done_creating() in nfsd_mkdir(), _nfsd_symlink() and nfsdfs_create_files() and we are done. Signed-off-by: Al Viro <[email protected]>
Parallel to binderfs stuff: * use simple_start_creating()/simple_done_creating()/d_make_persistent() instead of manual inode_lock()/lookup_noperm()/d_instanitate()/inode_unlock(). * allocate inode first - simpler cleanup that way. * use simple_recursive_removal() instead of open-coding it. * switch to kill_anon_super() Signed-off-by: Al Viro <[email protected]>
Not used anymore. Signed-off-by: Al Viro <[email protected]>
securityfs uses simple_recursive_removal(), but does not bother to mark
dentries persistent. This is the only place where it still happens; get
rid of that irregularity.
* use simple_{start,done}_creating() and d_make_persitent(); kill_litter_super()
use was already gone, since we empty the filesystem instance before it gets
shut down.
Acked-by: Paul Moore <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
Upstream branch: d6ec090 |
it's an unused alias for securityfs_remove() Acked-by: Paul Moore <[email protected]> Signed-off-by: Al Viro <[email protected]>
At this point there are very few call chains that might lead to d_make_discardable() on a dentry that hadn't been made persistent: calls of simple_unlink() and simple_rmdir() in configfs and apparmorfs. Both filesystems do pin (part of) their contents in dcache, but they are currently playing very unusual games with that. Converting them to more usual patterns might be possible, but it's definitely going to be a long series of changes in both cases. For now the easiest solution is to have both stop using simple_unlink() and simple_rmdir() - that allows to make d_make_discardable() warn when given a non-persistent dentry. Rather than giving them full-blown private copies (with calls of d_make_discardable() replaced with dput()), let's pull the parts of simple_unlink() and simple_rmdir() that deal with timestamps and link counts into separate helpers (__simple_unlink() and __simple_rmdir() resp.) and have those used by configfs and apparmorfs. Signed-off-by: Al Viro <[email protected]>
2c7d1f4 to
4c1aade
Compare
|
At least one diff in series https://patchwork.kernel.org/project/netdevbpf/list/?series=1024644 irrelevant now. Closing PR. |
Pull request for series with
subject: tree-in-dcache stuff
version: 4
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1024644