Skip to content

Commit 2eedfa9

Browse files
committed
Merge tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rename locking updates from Christian Brauner: "This contains the work from Jan to fix problems with cross-directory renames originally reported in [1]. To quickly sum it up some filesystems (so far we know at least about ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to lock the directory when it is being renamed into another directory. This is because we need to update the parent pointer in the directory in that case and if that races with other operations on the directory, in particular a conversion from one directory format into another, bad things can happen. So far we've done the locking in the filesystem code but recently Darrick pointed out in [2] that the RENAME_EXCHANGE case was missing. That one is particularly nasty because RENAME_EXCHANGE can arbitrarily mix regular files and directories and proper lock ordering is not achievable in the filesystems alone. This patch set adds locking into vfs_rename() so that not only parent directories but also moved inodes, regardless of whether they are directories or not, are locked when calling into the filesystem. This means establishing a locking order for unrelated directories. New helpers are added for this purpose and our documentation is updated to cover this in detail. The locking is now actually easier to follow as we now always lock source and target. We've always locked the target independent of whether it was a directory or file and we've always locked source if it was a regular file. The exact details for why this came about can be found in [3] and [4]" Link: https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3 [1] Link: https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs [2] Link: https://lore.kernel.org/all/20230526-schrebergarten-vortag-9cd89694517e@brauner [3] Link: https://lore.kernel.org/all/20230530-seenotrettung-allrad-44f4b00139d4@brauner [4] * tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Restrict lock_two_nondirectories() to non-directory inodes fs: Lock moved directories fs: Establish locking order for unrelated directories Revert "f2fs: fix potential corruption when moving a directory" Revert "udf: Protect rename against modification of moved directory" ext4: Remove ext4 locking of moved directory
2 parents 64bf6ae + 2454ad8 commit 2eedfa9

File tree

7 files changed

+89
-74
lines changed

7 files changed

+89
-74
lines changed

Documentation/filesystems/directory-locking.rst

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,11 @@ exclusive.
2222
3) object removal. Locking rules: caller locks parent, finds victim,
2323
locks victim and calls the method. Locks are exclusive.
2424

25-
4) rename() that is _not_ cross-directory. Locking rules: caller locks
26-
the parent and finds source and target. In case of exchange (with
27-
RENAME_EXCHANGE in flags argument) lock both. In any case,
28-
if the target already exists, lock it. If the source is a non-directory,
29-
lock it. If we need to lock both, lock them in inode pointer order.
30-
Then call the method. All locks are exclusive.
25+
4) rename() that is _not_ cross-directory. Locking rules: caller locks the
26+
parent and finds source and target. We lock both (provided they exist). If we
27+
need to lock two inodes of different type (dir vs non-dir), we lock directory
28+
first. If we need to lock two inodes of the same type, lock them in inode
29+
pointer order. Then call the method. All locks are exclusive.
3130
NB: we might get away with locking the source (and target in exchange
3231
case) shared.
3332

@@ -44,15 +43,17 @@ All locks are exclusive.
4443
rules:
4544

4645
* lock the filesystem
47-
* lock parents in "ancestors first" order.
46+
* lock parents in "ancestors first" order. If one is not ancestor of
47+
the other, lock them in inode pointer order.
4848
* find source and target.
4949
* if old parent is equal to or is a descendent of target
5050
fail with -ENOTEMPTY
5151
* if new parent is equal to or is a descendent of source
5252
fail with -ELOOP
53-
* If it's an exchange, lock both the source and the target.
54-
* If the target exists, lock it. If the source is a non-directory,
55-
lock it. If we need to lock both, do so in inode pointer order.
53+
* Lock both the source and the target provided they exist. If we
54+
need to lock two inodes of different type (dir vs non-dir), we lock
55+
the directory first. If we need to lock two inodes of the same type,
56+
lock them in inode pointer order.
5657
* call the method.
5758

5859
All ->i_rwsem are taken exclusive. Again, we might get away with locking
@@ -66,8 +67,9 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
6667

6768
Proof:
6869

69-
First of all, at any moment we have a partial ordering of the
70-
objects - A < B iff A is an ancestor of B.
70+
First of all, at any moment we have a linear ordering of the
71+
objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
72+
of A and ptr(A) < ptr(B)).
7173

7274
That ordering can change. However, the following is true:
7375

fs/ext4/namei.c

Lines changed: 2 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3834,19 +3834,10 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
38343834
return retval;
38353835
}
38363836

3837-
/*
3838-
* We need to protect against old.inode directory getting converted
3839-
* from inline directory format into a normal one.
3840-
*/
3841-
if (S_ISDIR(old.inode->i_mode))
3842-
inode_lock_nested(old.inode, I_MUTEX_NONDIR2);
3843-
38443837
old.bh = ext4_find_entry(old.dir, &old.dentry->d_name, &old.de,
38453838
&old.inlined);
3846-
if (IS_ERR(old.bh)) {
3847-
retval = PTR_ERR(old.bh);
3848-
goto unlock_moved_dir;
3849-
}
3839+
if (IS_ERR(old.bh))
3840+
return PTR_ERR(old.bh);
38503841

38513842
/*
38523843
* Check for inode number is _not_ due to possible IO errors.
@@ -4043,10 +4034,6 @@ static int ext4_rename(struct mnt_idmap *idmap, struct inode *old_dir,
40434034
brelse(old.bh);
40444035
brelse(new.bh);
40454036

4046-
unlock_moved_dir:
4047-
if (S_ISDIR(old.inode->i_mode))
4048-
inode_unlock(old.inode);
4049-
40504037
return retval;
40514038
}
40524039

fs/f2fs/namei.c

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -995,20 +995,12 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
995995
goto out;
996996
}
997997

998-
/*
999-
* Copied from ext4_rename: we need to protect against old.inode
1000-
* directory getting converted from inline directory format into
1001-
* a normal one.
1002-
*/
1003-
if (S_ISDIR(old_inode->i_mode))
1004-
inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
1005-
1006998
err = -ENOENT;
1007999
old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
10081000
if (!old_entry) {
10091001
if (IS_ERR(old_page))
10101002
err = PTR_ERR(old_page);
1011-
goto out_unlock_old;
1003+
goto out;
10121004
}
10131005

10141006
if (S_ISDIR(old_inode->i_mode)) {
@@ -1116,9 +1108,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
11161108

11171109
f2fs_unlock_op(sbi);
11181110

1119-
if (S_ISDIR(old_inode->i_mode))
1120-
inode_unlock(old_inode);
1121-
11221111
if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
11231112
f2fs_sync_fs(sbi->sb, 1);
11241113

@@ -1133,9 +1122,6 @@ static int f2fs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
11331122
f2fs_put_page(old_dir_page, 0);
11341123
out_old:
11351124
f2fs_put_page(old_page, 0);
1136-
out_unlock_old:
1137-
if (S_ISDIR(old_inode->i_mode))
1138-
inode_unlock(old_inode);
11391125
out:
11401126
iput(whiteout);
11411127
return err;

fs/inode.c

Lines changed: 52 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1103,24 +1103,62 @@ void discard_new_inode(struct inode *inode)
11031103
}
11041104
EXPORT_SYMBOL(discard_new_inode);
11051105

1106+
/**
1107+
* lock_two_inodes - lock two inodes (may be regular files but also dirs)
1108+
*
1109+
* Lock any non-NULL argument. The caller must make sure that if he is passing
1110+
* in two directories, one is not ancestor of the other. Zero, one or two
1111+
* objects may be locked by this function.
1112+
*
1113+
* @inode1: first inode to lock
1114+
* @inode2: second inode to lock
1115+
* @subclass1: inode lock subclass for the first lock obtained
1116+
* @subclass2: inode lock subclass for the second lock obtained
1117+
*/
1118+
void lock_two_inodes(struct inode *inode1, struct inode *inode2,
1119+
unsigned subclass1, unsigned subclass2)
1120+
{
1121+
if (!inode1 || !inode2) {
1122+
/*
1123+
* Make sure @subclass1 will be used for the acquired lock.
1124+
* This is not strictly necessary (no current caller cares) but
1125+
* let's keep things consistent.
1126+
*/
1127+
if (!inode1)
1128+
swap(inode1, inode2);
1129+
goto lock;
1130+
}
1131+
1132+
/*
1133+
* If one object is directory and the other is not, we must make sure
1134+
* to lock directory first as the other object may be its child.
1135+
*/
1136+
if (S_ISDIR(inode2->i_mode) == S_ISDIR(inode1->i_mode)) {
1137+
if (inode1 > inode2)
1138+
swap(inode1, inode2);
1139+
} else if (!S_ISDIR(inode1->i_mode))
1140+
swap(inode1, inode2);
1141+
lock:
1142+
if (inode1)
1143+
inode_lock_nested(inode1, subclass1);
1144+
if (inode2 && inode2 != inode1)
1145+
inode_lock_nested(inode2, subclass2);
1146+
}
1147+
11061148
/**
11071149
* lock_two_nondirectories - take two i_mutexes on non-directory objects
11081150
*
1109-
* Lock any non-NULL argument that is not a directory.
1151+
* Lock any non-NULL argument. Passed objects must not be directories.
11101152
* Zero, one or two objects may be locked by this function.
11111153
*
11121154
* @inode1: first inode to lock
11131155
* @inode2: second inode to lock
11141156
*/
11151157
void lock_two_nondirectories(struct inode *inode1, struct inode *inode2)
11161158
{
1117-
if (inode1 > inode2)
1118-
swap(inode1, inode2);
1119-
1120-
if (inode1 && !S_ISDIR(inode1->i_mode))
1121-
inode_lock(inode1);
1122-
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
1123-
inode_lock_nested(inode2, I_MUTEX_NONDIR2);
1159+
WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
1160+
WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
1161+
lock_two_inodes(inode1, inode2, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
11241162
}
11251163
EXPORT_SYMBOL(lock_two_nondirectories);
11261164

@@ -1131,10 +1169,14 @@ EXPORT_SYMBOL(lock_two_nondirectories);
11311169
*/
11321170
void unlock_two_nondirectories(struct inode *inode1, struct inode *inode2)
11331171
{
1134-
if (inode1 && !S_ISDIR(inode1->i_mode))
1172+
if (inode1) {
1173+
WARN_ON_ONCE(S_ISDIR(inode1->i_mode));
11351174
inode_unlock(inode1);
1136-
if (inode2 && !S_ISDIR(inode2->i_mode) && inode2 != inode1)
1175+
}
1176+
if (inode2 && inode2 != inode1) {
1177+
WARN_ON_ONCE(S_ISDIR(inode2->i_mode));
11371178
inode_unlock(inode2);
1179+
}
11381180
}
11391181
EXPORT_SYMBOL(unlock_two_nondirectories);
11401182

fs/internal.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,8 @@ extern long prune_icache_sb(struct super_block *sb, struct shrink_control *sc);
193193
int dentry_needs_remove_privs(struct mnt_idmap *, struct dentry *dentry);
194194
bool in_group_or_capable(struct mnt_idmap *idmap,
195195
const struct inode *inode, vfsgid_t vfsgid);
196+
void lock_two_inodes(struct inode *inode1, struct inode *inode2,
197+
unsigned subclass1, unsigned subclass2);
196198

197199
/*
198200
* fs-writeback.c

fs/namei.c

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3028,8 +3028,8 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
30283028
return p;
30293029
}
30303030

3031-
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
3032-
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
3031+
lock_two_inodes(p1->d_inode, p2->d_inode,
3032+
I_MUTEX_PARENT, I_MUTEX_PARENT2);
30333033
return NULL;
30343034
}
30353035

@@ -4731,7 +4731,7 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
47314731
* sb->s_vfs_rename_mutex. We might be more accurate, but that's another
47324732
* story.
47334733
* c) we have to lock _four_ objects - parents and victim (if it exists),
4734-
* and source (if it is not a directory).
4734+
* and source.
47354735
* And that - after we got ->i_mutex on parents (until then we don't know
47364736
* whether the target exists). Solution: try to be smart with locking
47374737
* order for inodes. We rely on the fact that tree topology may change
@@ -4815,10 +4815,16 @@ int vfs_rename(struct renamedata *rd)
48154815

48164816
take_dentry_name_snapshot(&old_name, old_dentry);
48174817
dget(new_dentry);
4818-
if (!is_dir || (flags & RENAME_EXCHANGE))
4819-
lock_two_nondirectories(source, target);
4820-
else if (target)
4821-
inode_lock(target);
4818+
/*
4819+
* Lock all moved children. Moved directories may need to change parent
4820+
* pointer so they need the lock to prevent against concurrent
4821+
* directory changes moving parent pointer. For regular files we've
4822+
* historically always done this. The lockdep locking subclasses are
4823+
* somewhat arbitrary but RENAME_EXCHANGE in particular can swap
4824+
* regular files and directories so it's difficult to tell which
4825+
* subclasses to use.
4826+
*/
4827+
lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
48224828

48234829
error = -EPERM;
48244830
if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4866,9 +4872,9 @@ int vfs_rename(struct renamedata *rd)
48664872
d_exchange(old_dentry, new_dentry);
48674873
}
48684874
out:
4869-
if (!is_dir || (flags & RENAME_EXCHANGE))
4870-
unlock_two_nondirectories(source, target);
4871-
else if (target)
4875+
if (source)
4876+
inode_unlock(source);
4877+
if (target)
48724878
inode_unlock(target);
48734879
dput(new_dentry);
48744880
if (!error) {

fs/udf/namei.c

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -793,11 +793,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
793793
if (!empty_dir(new_inode))
794794
goto out_oiter;
795795
}
796-
/*
797-
* We need to protect against old_inode getting converted from
798-
* ICB to normal directory.
799-
*/
800-
inode_lock_nested(old_inode, I_MUTEX_NONDIR2);
801796
retval = udf_fiiter_find_entry(old_inode, &dotdot_name,
802797
&diriter);
803798
if (retval == -ENOENT) {
@@ -806,10 +801,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
806801
old_inode->i_ino);
807802
retval = -EFSCORRUPTED;
808803
}
809-
if (retval) {
810-
inode_unlock(old_inode);
804+
if (retval)
811805
goto out_oiter;
812-
}
813806
has_diriter = true;
814807
tloc = lelb_to_cpu(diriter.fi.icb.extLocation);
815808
if (udf_get_lb_pblock(old_inode->i_sb, &tloc, 0) !=
@@ -889,7 +882,6 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
889882
udf_dir_entry_len(&diriter.fi));
890883
udf_fiiter_write_fi(&diriter, NULL);
891884
udf_fiiter_release(&diriter);
892-
inode_unlock(old_inode);
893885

894886
inode_dec_link_count(old_dir);
895887
if (new_inode)
@@ -901,10 +893,8 @@ static int udf_rename(struct mnt_idmap *idmap, struct inode *old_dir,
901893
}
902894
return 0;
903895
out_oiter:
904-
if (has_diriter) {
896+
if (has_diriter)
905897
udf_fiiter_release(&diriter);
906-
inode_unlock(old_inode);
907-
}
908898
udf_fiiter_release(&oiter);
909899

910900
return retval;

0 commit comments

Comments
 (0)