Skip to content

Commit 75e68e5

Browse files
jsitnickiAlexei Starovoitov
authored andcommitted
bpf, sockhash: Synchronize delete from bucket list on map free
We can end up modifying the sockhash bucket list from two CPUs when a sockhash is being destroyed (sock_hash_free) on one CPU, while a socket that is in the sockhash is unlinking itself from it on another CPU it (sock_hash_delete_from_link). This results in accessing a list element that is in an undefined state as reported by KASAN: | ================================================================== | BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280 | Write of size 8 at addr dead000000000122 by task kworker/2:1/95 | | CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691 | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 | Workqueue: events bpf_map_free_deferred | Call Trace: | dump_stack+0x97/0xe0 | ? sock_hash_free+0x13c/0x280 | __kasan_report.cold+0x5/0x40 | ? mark_lock+0xbc1/0xc00 | ? sock_hash_free+0x13c/0x280 | kasan_report+0x38/0x50 | ? sock_hash_free+0x152/0x280 | sock_hash_free+0x13c/0x280 | bpf_map_free_deferred+0xb2/0xd0 | ? bpf_map_charge_finish+0x50/0x50 | ? rcu_read_lock_sched_held+0x81/0xb0 | ? rcu_read_lock_bh_held+0x90/0x90 | process_one_work+0x59a/0xac0 | ? lock_release+0x3b0/0x3b0 | ? pwq_dec_nr_in_flight+0x110/0x110 | ? rwlock_bug.part.0+0x60/0x60 | worker_thread+0x7a/0x680 | ? _raw_spin_unlock_irqrestore+0x4c/0x60 | kthread+0x1cc/0x220 | ? process_one_work+0xac0/0xac0 | ? kthread_create_on_node+0xa0/0xa0 | ret_from_fork+0x24/0x30 | ================================================================== Fix it by reintroducing spin-lock protected critical section around the code that removes the elements from the bucket on sockhash free. To do that we also need to defer processing of removed elements, until out of atomic context so that we can unlink the socket from the map when holding the sock lock. Fixes: 90db6d7 ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free") Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
1 parent 33a7c83 commit 75e68e5

File tree

1 file changed

+21
-2
lines changed

1 file changed

+21
-2
lines changed

net/core/sock_map.c

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1013,6 +1013,7 @@ static void sock_hash_free(struct bpf_map *map)
10131013
{
10141014
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
10151015
struct bpf_htab_bucket *bucket;
1016+
struct hlist_head unlink_list;
10161017
struct bpf_htab_elem *elem;
10171018
struct hlist_node *node;
10181019
int i;
@@ -1024,13 +1025,31 @@ static void sock_hash_free(struct bpf_map *map)
10241025
synchronize_rcu();
10251026
for (i = 0; i < htab->buckets_num; i++) {
10261027
bucket = sock_hash_select_bucket(htab, i);
1027-
hlist_for_each_entry_safe(elem, node, &bucket->head, node) {
1028-
hlist_del_rcu(&elem->node);
1028+
1029+
/* We are racing with sock_hash_delete_from_link to
1030+
* enter the spin-lock critical section. Every socket on
1031+
* the list is still linked to sockhash. Since link
1032+
* exists, psock exists and holds a ref to socket. That
1033+
* lets us to grab a socket ref too.
1034+
*/
1035+
raw_spin_lock_bh(&bucket->lock);
1036+
hlist_for_each_entry(elem, &bucket->head, node)
1037+
sock_hold(elem->sk);
1038+
hlist_move_list(&bucket->head, &unlink_list);
1039+
raw_spin_unlock_bh(&bucket->lock);
1040+
1041+
/* Process removed entries out of atomic context to
1042+
* block for socket lock before deleting the psock's
1043+
* link to sockhash.
1044+
*/
1045+
hlist_for_each_entry_safe(elem, node, &unlink_list, node) {
1046+
hlist_del(&elem->node);
10291047
lock_sock(elem->sk);
10301048
rcu_read_lock();
10311049
sock_map_unref(elem->sk, elem);
10321050
rcu_read_unlock();
10331051
release_sock(elem->sk);
1052+
sock_put(elem->sk);
10341053
sock_hash_free_elem(htab, elem);
10351054
}
10361055
}

0 commit comments

Comments
 (0)