Skip to content

Commit 7ee85f5

Browse files
fdmananakdave
authored andcommitted
btrfs: fix race setting file private on concurrent lseek using same fd
When doing concurrent lseek(2) system calls against the same file descriptor, using multiple threads belonging to the same process, we have a short time window where a race happens and can result in a memory leak. The race happens like this: 1) A program opens a file descriptor for a file and then spawns two threads (with the pthreads library for example), lets call them task A and task B; 2) Task A calls lseek with SEEK_DATA or SEEK_HOLE and ends up at file.c:find_desired_extent() while holding a read lock on the inode; 3) At the start of find_desired_extent(), it extracts the file's private_data pointer into a local variable named 'private', which has a value of NULL; 4) Task B also calls lseek with SEEK_DATA or SEEK_HOLE, locks the inode in shared mode and enters file.c:find_desired_extent(), where it also extracts file->private_data into its local variable 'private', which has a NULL value; 5) Because it saw a NULL file private, task A allocates a private structure and assigns to the file structure; 6) Task B also saw a NULL file private so it also allocates its own file private and then assigns it to the same file structure, since both tasks are using the same file descriptor. At this point we leak the private structure allocated by task A. Besides the memory leak, there's also the detail that both tasks end up using the same cached state record in the private structure (struct btrfs_file_private::llseek_cached_state), which can result in a use-after-free problem since one task can free it while the other is still using it (only one task took a reference count on it). Also, sharing the cached state is not a good idea since it could result in incorrect results in the future - right now it should not be a problem because it end ups being used only in extent-io-tree.c:count_range_bits() where we do range validation before using the cached state. Fix this by protecting the private assignment and check of a file while holding the inode's spinlock and keep track of the task that allocated the private, so that it's used only by that task in order to prevent user-after-free issues with the cached state record as well as potentially using it incorrectly in the future. Fixes: 3c32c72 ("btrfs: use cached state when looking for delalloc ranges with lseek") CC: [email protected] # 6.6+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
1 parent bd610c0 commit 7ee85f5

File tree

3 files changed

+34
-3
lines changed

3 files changed

+34
-3
lines changed

fs/btrfs/btrfs_inode.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,7 @@ struct btrfs_inode {
152152
* logged_trans), to access/update delalloc_bytes, new_delalloc_bytes,
153153
* defrag_bytes, disk_i_size, outstanding_extents, csum_bytes and to
154154
* update the VFS' inode number of bytes used.
155+
* Also protects setting struct file::private_data.
155156
*/
156157
spinlock_t lock;
157158

fs/btrfs/ctree.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,8 @@ struct btrfs_file_private {
463463
void *filldir_buf;
464464
u64 last_index;
465465
struct extent_state *llseek_cached_state;
466+
/* Task that allocated this structure. */
467+
struct task_struct *owner_task;
466468
};
467469

468470
static inline u32 BTRFS_LEAF_DATA_SIZE(const struct btrfs_fs_info *info)

fs/btrfs/file.c

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3485,7 +3485,7 @@ static bool find_desired_extent_in_hole(struct btrfs_inode *inode, int whence,
34853485
static loff_t find_desired_extent(struct file *file, loff_t offset, int whence)
34863486
{
34873487
struct btrfs_inode *inode = BTRFS_I(file->f_mapping->host);
3488-
struct btrfs_file_private *private = file->private_data;
3488+
struct btrfs_file_private *private;
34893489
struct btrfs_fs_info *fs_info = inode->root->fs_info;
34903490
struct extent_state *cached_state = NULL;
34913491
struct extent_state **delalloc_cached_state;
@@ -3513,15 +3513,43 @@ static loff_t find_desired_extent(struct file *file, loff_t offset, int whence)
35133513
inode_get_bytes(&inode->vfs_inode) == i_size)
35143514
return i_size;
35153515

3516-
if (!private) {
3516+
spin_lock(&inode->lock);
3517+
private = file->private_data;
3518+
spin_unlock(&inode->lock);
3519+
3520+
if (private && private->owner_task != current) {
3521+
/*
3522+
* Not allocated by us, don't use it as its cached state is used
3523+
* by the task that allocated it and we don't want neither to
3524+
* mess with it nor get incorrect results because it reflects an
3525+
* invalid state for the current task.
3526+
*/
3527+
private = NULL;
3528+
} else if (!private) {
35173529
private = kzalloc(sizeof(*private), GFP_KERNEL);
35183530
/*
35193531
* No worries if memory allocation failed.
35203532
* The private structure is used only for speeding up multiple
35213533
* lseek SEEK_HOLE/DATA calls to a file when there's delalloc,
35223534
* so everything will still be correct.
35233535
*/
3524-
file->private_data = private;
3536+
if (private) {
3537+
bool free = false;
3538+
3539+
private->owner_task = current;
3540+
3541+
spin_lock(&inode->lock);
3542+
if (file->private_data)
3543+
free = true;
3544+
else
3545+
file->private_data = private;
3546+
spin_unlock(&inode->lock);
3547+
3548+
if (free) {
3549+
kfree(private);
3550+
private = NULL;
3551+
}
3552+
}
35253553
}
35263554

35273555
if (private)

0 commit comments

Comments
 (0)