Skip to content

Commit 0c23734

Browse files
author
Alexei Starovoitov
committed
Merge branch 'fixes-for-bpf-timer-lockup-and-uaf'
Kumar Kartikeya Dwivedi says: ==================== Fixes for BPF timer lockup and UAF The following patches contain fixes for timer lockups and a use-after-free scenario. This set proposes to fix the following lockup situation for BPF timers. CPU 1 CPU 2 bpf_timer_cb bpf_timer_cb timer_cb1 timer_cb2 bpf_timer_cancel(timer_cb2) bpf_timer_cancel(timer_cb1) hrtimer_cancel hrtimer_cancel In this case, both callbacks will continue waiting for each other to finish synchronously, causing a lockup. The proposed fix adds support for tracking in-flight cancellations *begun by other timer callbacks* for a particular BPF timer. Whenever preparing to call hrtimer_cancel, a callback will increment the target timer's counter, then inspect its in-flight cancellations, and if non-zero, return -EDEADLK to avoid situations where the target timer's callback is waiting for its completion. This does mean that in cases where a callback is fired and cancelled, it will be unable to cancel any timers in that execution. This can be alleviated by maintaining the list of waiting callbacks in bpf_hrtimer and searching through it to avoid interdependencies, but this may introduce additional delays in bpf_timer_cancel, in addition to requiring extra state at runtime which may need to be allocated or reused from bpf_hrtimer storage. Moreover, extra synchronization is needed to delete these elements from the list of waiting callbacks once hrtimer_cancel has finished. The second patch is for a deadlock situation similar to above in bpf_timer_cancel_and_free, but also a UAF scenario that can occur if timer is armed before entering it, if hrtimer_running check causes the hrtimer_cancel call to be skipped. As seen above, synchronous hrtimer_cancel would lead to deadlock (if same callback tries to free its timer, or two timers free each other), therefore we queue work onto the global workqueue to ensure outstanding timers are cancelled before bpf_hrtimer state is freed. Further details are in the patches. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents af253ae + a6fcd19 commit 0c23734

File tree

1 file changed

+82
-17
lines changed

1 file changed

+82
-17
lines changed

kernel/bpf/helpers.c

Lines changed: 82 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,10 @@ struct bpf_async_cb {
10841084
struct bpf_prog *prog;
10851085
void __rcu *callback_fn;
10861086
void *value;
1087-
struct rcu_head rcu;
1087+
union {
1088+
struct rcu_head rcu;
1089+
struct work_struct delete_work;
1090+
};
10881091
u64 flags;
10891092
};
10901093

@@ -1107,6 +1110,7 @@ struct bpf_async_cb {
11071110
struct bpf_hrtimer {
11081111
struct bpf_async_cb cb;
11091112
struct hrtimer timer;
1113+
atomic_t cancelling;
11101114
};
11111115

11121116
struct bpf_work {
@@ -1219,6 +1223,21 @@ static void bpf_wq_delete_work(struct work_struct *work)
12191223
kfree_rcu(w, cb.rcu);
12201224
}
12211225

1226+
static void bpf_timer_delete_work(struct work_struct *work)
1227+
{
1228+
struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, cb.delete_work);
1229+
1230+
/* Cancel the timer and wait for callback to complete if it was running.
1231+
* If hrtimer_cancel() can be safely called it's safe to call
1232+
* kfree_rcu(t) right after for both preallocated and non-preallocated
1233+
* maps. The async->cb = NULL was already done and no code path can see
1234+
* address 't' anymore. Timer if armed for existing bpf_hrtimer before
1235+
* bpf_timer_cancel_and_free will have been cancelled.
1236+
*/
1237+
hrtimer_cancel(&t->timer);
1238+
kfree_rcu(t, cb.rcu);
1239+
}
1240+
12221241
static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
12231242
enum bpf_async_type type)
12241243
{
@@ -1262,6 +1281,8 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
12621281
clockid = flags & (MAX_CLOCKS - 1);
12631282
t = (struct bpf_hrtimer *)cb;
12641283

1284+
atomic_set(&t->cancelling, 0);
1285+
INIT_WORK(&t->cb.delete_work, bpf_timer_delete_work);
12651286
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
12661287
t->timer.function = bpf_timer_cb;
12671288
cb->value = (void *)async - map->record->timer_off;
@@ -1440,7 +1461,8 @@ static void drop_prog_refcnt(struct bpf_async_cb *async)
14401461

14411462
BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
14421463
{
1443-
struct bpf_hrtimer *t;
1464+
struct bpf_hrtimer *t, *cur_t;
1465+
bool inc = false;
14441466
int ret = 0;
14451467

14461468
if (in_nmi())
@@ -1452,21 +1474,50 @@ BPF_CALL_1(bpf_timer_cancel, struct bpf_async_kern *, timer)
14521474
ret = -EINVAL;
14531475
goto out;
14541476
}
1455-
if (this_cpu_read(hrtimer_running) == t) {
1477+
1478+
cur_t = this_cpu_read(hrtimer_running);
1479+
if (cur_t == t) {
14561480
/* If bpf callback_fn is trying to bpf_timer_cancel()
14571481
* its own timer the hrtimer_cancel() will deadlock
1458-
* since it waits for callback_fn to finish
1482+
* since it waits for callback_fn to finish.
14591483
*/
14601484
ret = -EDEADLK;
14611485
goto out;
14621486
}
1487+
1488+
/* Only account in-flight cancellations when invoked from a timer
1489+
* callback, since we want to avoid waiting only if other _callbacks_
1490+
* are waiting on us, to avoid introducing lockups. Non-callback paths
1491+
* are ok, since nobody would synchronously wait for their completion.
1492+
*/
1493+
if (!cur_t)
1494+
goto drop;
1495+
atomic_inc(&t->cancelling);
1496+
/* Need full barrier after relaxed atomic_inc */
1497+
smp_mb__after_atomic();
1498+
inc = true;
1499+
if (atomic_read(&cur_t->cancelling)) {
1500+
/* We're cancelling timer t, while some other timer callback is
1501+
* attempting to cancel us. In such a case, it might be possible
1502+
* that timer t belongs to the other callback, or some other
1503+
* callback waiting upon it (creating transitive dependencies
1504+
* upon us), and we will enter a deadlock if we continue
1505+
* cancelling and waiting for it synchronously, since it might
1506+
* do the same. Bail!
1507+
*/
1508+
ret = -EDEADLK;
1509+
goto out;
1510+
}
1511+
drop:
14631512
drop_prog_refcnt(&t->cb);
14641513
out:
14651514
__bpf_spin_unlock_irqrestore(&timer->lock);
14661515
/* Cancel the timer and wait for associated callback to finish
14671516
* if it was running.
14681517
*/
14691518
ret = ret ?: hrtimer_cancel(&t->timer);
1519+
if (inc)
1520+
atomic_dec(&t->cancelling);
14701521
rcu_read_unlock();
14711522
return ret;
14721523
}
@@ -1512,25 +1563,39 @@ void bpf_timer_cancel_and_free(void *val)
15121563

15131564
if (!t)
15141565
return;
1515-
/* Cancel the timer and wait for callback to complete if it was running.
1516-
* If hrtimer_cancel() can be safely called it's safe to call kfree(t)
1517-
* right after for both preallocated and non-preallocated maps.
1518-
* The async->cb = NULL was already done and no code path can
1519-
* see address 't' anymore.
1520-
*
1521-
* Check that bpf_map_delete/update_elem() wasn't called from timer
1522-
* callback_fn. In such case don't call hrtimer_cancel() (since it will
1523-
* deadlock) and don't call hrtimer_try_to_cancel() (since it will just
1524-
* return -1). Though callback_fn is still running on this cpu it's
1566+
/* We check that bpf_map_delete/update_elem() was called from timer
1567+
* callback_fn. In such case we don't call hrtimer_cancel() (since it
1568+
* will deadlock) and don't call hrtimer_try_to_cancel() (since it will
1569+
* just return -1). Though callback_fn is still running on this cpu it's
15251570
* safe to do kfree(t) because bpf_timer_cb() read everything it needed
15261571
* from 't'. The bpf subprog callback_fn won't be able to access 't',
15271572
* since async->cb = NULL was already done. The timer will be
15281573
* effectively cancelled because bpf_timer_cb() will return
15291574
* HRTIMER_NORESTART.
1575+
*
1576+
* However, it is possible the timer callback_fn calling us armed the
1577+
* timer _before_ calling us, such that failing to cancel it here will
1578+
* cause it to possibly use struct hrtimer after freeing bpf_hrtimer.
1579+
* Therefore, we _need_ to cancel any outstanding timers before we do
1580+
* kfree_rcu, even though no more timers can be armed.
1581+
*
1582+
* Moreover, we need to schedule work even if timer does not belong to
1583+
* the calling callback_fn, as on two different CPUs, we can end up in a
1584+
* situation where both sides run in parallel, try to cancel one
1585+
* another, and we end up waiting on both sides in hrtimer_cancel
1586+
* without making forward progress, since timer1 depends on time2
1587+
* callback to finish, and vice versa.
1588+
*
1589+
* CPU 1 (timer1_cb) CPU 2 (timer2_cb)
1590+
* bpf_timer_cancel_and_free(timer2) bpf_timer_cancel_and_free(timer1)
1591+
*
1592+
* To avoid these issues, punt to workqueue context when we are in a
1593+
* timer callback.
15301594
*/
1531-
if (this_cpu_read(hrtimer_running) != t)
1532-
hrtimer_cancel(&t->timer);
1533-
kfree_rcu(t, cb.rcu);
1595+
if (this_cpu_read(hrtimer_running))
1596+
queue_work(system_unbound_wq, &t->cb.delete_work);
1597+
else
1598+
bpf_timer_delete_work(&t->cb.delete_work);
15341599
}
15351600

15361601
/* This function is called by map_delete/update_elem for individual element and

0 commit comments

Comments
 (0)