Skip to content

Commit 9f2dc5f

Browse files
author
Kent Overstreet
committed
bcachefs: Fix oops in btree_node_seq_matches()
btree_update_nodes_written() needs to wait on in-flight writes to old nodes before marking them as freed. But it has no reason to pin those old nodes in memory, so some trickyness ensues. The update we're completing deleted references to those nodes from the btree, so we know if they've been evicted they can't be pulled back in. We just have to check if the nodes we have pointers to are still those old nodes, and haven't been reused. To do that we check the node's "sequence number" (actually a random 64 bit cookie), but that lives in the node's data buffer. 'struct btree' can't be freed until filesystem shutdown (as they're quite small), but the data buffers can be freed or swapped around. Commit 1f88c35, which was fixing a kmsan warning, assumed that we could safely do this locklessly with just a READ_ONCE() - if we've got a non-null ptr it would be safe to read from. But that's not true if the data buffer is a vmalloc allocation, so we need to restore the locking that commit deleted (or alternatively RCU free those data buffers, but there's no other reason for that). Fixes: 1f88c35 ("bcachefs: Fix a KMSAN splat in btree_update_nodes_written()") Signed-off-by: Kent Overstreet <[email protected]>
1 parent 2bf380c commit 9f2dc5f

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

fs/bcachefs/btree_update_interior.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -684,12 +684,31 @@ static void btree_update_nodes_written(struct btree_update *as)
684684

685685
/*
686686
* Wait for any in flight writes to finish before we free the old nodes
687-
* on disk:
687+
* on disk. But we haven't pinned those old nodes in the btree cache,
688+
* they might have already been evicted.
689+
*
690+
* The update we're completing deleted references to those nodes from the
691+
* btree, so we know if they've been evicted they can't be pulled back in.
692+
* We just have to check if the nodes we have pointers to are still those
693+
* old nodes, and haven't been reused.
694+
*
695+
* This can't be done locklessly because the data buffer might have been
696+
* vmalloc allocated, and they're not RCU freed. We also need the
697+
* __no_kmsan_checks annotation because even with the btree node read
698+
* lock, nothing tells us that the data buffer has been initialized (if
699+
* the btree node has been reused for a different node, and the data
700+
* buffer swapped for a new data buffer).
688701
*/
689702
for (i = 0; i < as->nr_old_nodes; i++) {
690703
b = as->old_nodes[i];
691704

692-
if (btree_node_seq_matches(b, as->old_nodes_seq[i]))
705+
bch2_trans_begin(trans);
706+
btree_node_lock_nopath_nofail(trans, &b->c, SIX_LOCK_read);
707+
bool seq_matches = btree_node_seq_matches(b, as->old_nodes_seq[i]);
708+
six_unlock_read(&b->c.lock);
709+
bch2_trans_unlock_long(trans);
710+
711+
if (seq_matches)
693712
wait_on_bit_io(&b->flags, BTREE_NODE_write_in_flight_inner,
694713
TASK_UNINTERRUPTIBLE);
695714
}

0 commit comments

Comments
 (0)