Skip to content

Commit 28eceed

Browse files
jankarabrauner
authored andcommitted
fs: Lock moved directories
When a directory is moved to a different directory, some filesystems (udf, ext4, ocfs2, f2fs, and likely gfs2, reiserfs, and others) need to update their pointer to the parent and this must not race with other operations on the directory. Lock the directories when they are moved. Although not all filesystems need this locking, we perform it in vfs_rename() because getting the lock ordering right is really difficult and we don't want to expose these locking details to filesystems. CC: [email protected] Signed-off-by: Jan Kara <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
1 parent f23ce75 commit 28eceed

File tree

2 files changed

+28
-20
lines changed

2 files changed

+28
-20
lines changed

Documentation/filesystems/directory-locking.rst

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,11 @@ exclusive.
2222
3) object removal. Locking rules: caller locks parent, finds victim,
2323
locks victim and calls the method. Locks are exclusive.
2424

25-
4) rename() that is _not_ cross-directory. Locking rules: caller locks
26-
the parent and finds source and target. In case of exchange (with
27-
RENAME_EXCHANGE in flags argument) lock both. In any case,
28-
if the target already exists, lock it. If the source is a non-directory,
29-
lock it. If we need to lock both, lock them in inode pointer order.
30-
Then call the method. All locks are exclusive.
25+
4) rename() that is _not_ cross-directory. Locking rules: caller locks the
26+
parent and finds source and target. We lock both (provided they exist). If we
27+
need to lock two inodes of different type (dir vs non-dir), we lock directory
28+
first. If we need to lock two inodes of the same type, lock them in inode
29+
pointer order. Then call the method. All locks are exclusive.
3130
NB: we might get away with locking the source (and target in exchange
3231
case) shared.
3332

@@ -44,15 +43,17 @@ All locks are exclusive.
4443
rules:
4544

4645
* lock the filesystem
47-
* lock parents in "ancestors first" order.
46+
* lock parents in "ancestors first" order. If one is not ancestor of
47+
the other, lock them in inode pointer order.
4848
* find source and target.
4949
* if old parent is equal to or is a descendent of target
5050
fail with -ENOTEMPTY
5151
* if new parent is equal to or is a descendent of source
5252
fail with -ELOOP
53-
* If it's an exchange, lock both the source and the target.
54-
* If the target exists, lock it. If the source is a non-directory,
55-
lock it. If we need to lock both, do so in inode pointer order.
53+
* Lock both the source and the target provided they exist. If we
54+
need to lock two inodes of different type (dir vs non-dir), we lock
55+
the directory first. If we need to lock two inodes of the same type,
56+
lock them in inode pointer order.
5657
* call the method.
5758

5859
All ->i_rwsem are taken exclusive. Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
6667

6768
Proof:
6869

69-
First of all, at any moment we have a partial ordering of the
70-
objects - A < B iff A is an ancestor of B.
70+
First of all, at any moment we have a linear ordering of the
71+
objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
72+
of A and ptr(A) < ptr(B)).
7173

7274
That ordering can change. However, the following is true:
7375

fs/namei.c

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4731,7 +4731,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
47314731
* sb->s_vfs_rename_mutex. We might be more accurate, but that's another
47324732
* story.
47334733
* c) we have to lock _four_ objects - parents and victim (if it exists),
4734-
* and source (if it is not a directory).
4734+
* and source.
47354735
* And that - after we got ->i_mutex on parents (until then we don't know
47364736
* whether the target exists). Solution: try to be smart with locking
47374737
* order for inodes. We rely on the fact that tree topology may change
@@ -4815,10 +4815,16 @@ int vfs_rename(struct renamedata *rd)
48154815

48164816
take_dentry_name_snapshot(&old_name, old_dentry);
48174817
dget(new_dentry);
4818-
if (!is_dir || (flags & RENAME_EXCHANGE))
4819-
lock_two_nondirectories(source, target);
4820-
else if (target)
4821-
inode_lock(target);
4818+
/*
4819+
* Lock all moved children. Moved directories may need to change parent
4820+
* pointer so they need the lock to prevent against concurrent
4821+
* directory changes moving parent pointer. For regular files we've
4822+
* historically always done this. The lockdep locking subclasses are
4823+
* somewhat arbitrary but RENAME_EXCHANGE in particular can swap
4824+
* regular files and directories so it's difficult to tell which
4825+
* subclasses to use.
4826+
*/
4827+
lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
48224828

48234829
error = -EPERM;
48244830
if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4866,9 +4872,9 @@ int vfs_rename(struct renamedata *rd)
48664872
d_exchange(old_dentry, new_dentry);
48674873
}
48684874
out:
4869-
if (!is_dir || (flags & RENAME_EXCHANGE))
4870-
unlock_two_nondirectories(source, target);
4871-
else if (target)
4875+
if (source)
4876+
inode_unlock(source);
4877+
if (target)
48724878
inode_unlock(target);
48734879
dput(new_dentry);
48744880
if (!error) {

0 commit comments

Comments
 (0)