Skip to content

Commit a6fcd19

Browse files
kkdwivediAlexei Starovoitov
authored andcommitted
bpf: Defer work in bpf_timer_cancel_and_free
Currently, the same case as previous patch (two timer callbacks trying to cancel each other) can be invoked through bpf_map_update_elem as well, or more precisely, freeing map elements containing timers. Since this relies on hrtimer_cancel as well, it is prone to the same deadlock situation as the previous patch. It would be sufficient to use hrtimer_try_to_cancel to fix this problem, as the timer cannot be enqueued after async_cancel_and_free. Once async_cancel_and_free has been done, the timer must be reinitialized before it can be armed again. The callback running in parallel trying to arm the timer will fail, and freeing bpf_hrtimer without waiting is sufficient (given kfree_rcu), and bpf_timer_cb will return HRTIMER_NORESTART, preventing the timer from being rearmed again. However, there exists a UAF scenario where the callback arms the timer before entering this function, such that if cancellation fails (due to timer callback invoking this routine, or the target timer callback running concurrently). In such a case, if the timer expiration is significantly far in the future, the RCU grace period expiration happening before it will free the bpf_hrtimer state and along with it the struct hrtimer, that is enqueued. Hence, it is clear cancellation needs to occur after async_cancel_and_free, and yet it cannot be done inline due to deadlock issues. We thus modify bpf_timer_cancel_and_free to defer work to the global workqueue, adding a work_struct alongside rcu_head (both used at _different_ points of time, so can share space). Update existing code comments to reflect the new state of affairs. Fixes: b00628b ("bpf: Introduce bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
1 parent d452383 commit a6fcd19

File tree

1 file changed

+47
-14
lines changed

1 file changed

+47
-14
lines changed

kernel/bpf/helpers.c

Lines changed: 47 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1084,7 +1084,10 @@ struct bpf_async_cb {
10841084
struct bpf_prog *prog;
10851085
void __rcu *callback_fn;
10861086
void *value;
1087-
struct rcu_head rcu;
1087+
union {
1088+
struct rcu_head rcu;
1089+
struct work_struct delete_work;
1090+
};
10881091
u64 flags;
10891092
};
10901093

@@ -1220,6 +1223,21 @@ static void bpf_wq_delete_work(struct work_struct *work)
12201223
kfree_rcu(w, cb.rcu);
12211224
}
12221225

1226+
static void bpf_timer_delete_work(struct work_struct *work)
1227+
{
1228+
struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, cb.delete_work);
1229+
1230+
/* Cancel the timer and wait for callback to complete if it was running.
1231+
* If hrtimer_cancel() can be safely called it's safe to call
1232+
* kfree_rcu(t) right after for both preallocated and non-preallocated
1233+
* maps. The async->cb = NULL was already done and no code path can see
1234+
* address 't' anymore. Timer if armed for existing bpf_hrtimer before
1235+
* bpf_timer_cancel_and_free will have been cancelled.
1236+
*/
1237+
hrtimer_cancel(&t->timer);
1238+
kfree_rcu(t, cb.rcu);
1239+
}
1240+
12231241
static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u64 flags,
12241242
enum bpf_async_type type)
12251243
{
@@ -1264,6 +1282,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
12641282
t = (struct bpf_hrtimer *)cb;
12651283

12661284
atomic_set(&t->cancelling, 0);
1285+
INIT_WORK(&t->cb.delete_work, bpf_timer_delete_work);
12671286
hrtimer_init(&t->timer, clockid, HRTIMER_MODE_REL_SOFT);
12681287
t->timer.function = bpf_timer_cb;
12691288
cb->value = (void *)async - map->record->timer_off;
@@ -1544,25 +1563,39 @@ void bpf_timer_cancel_and_free(void *val)
15441563

15451564
if (!t)
15461565
return;
1547-
/* Cancel the timer and wait for callback to complete if it was running.
1548-
* If hrtimer_cancel() can be safely called it's safe to call kfree(t)
1549-
* right after for both preallocated and non-preallocated maps.
1550-
* The async->cb = NULL was already done and no code path can
1551-
* see address 't' anymore.
1552-
*
1553-
* Check that bpf_map_delete/update_elem() wasn't called from timer
1554-
* callback_fn. In such case don't call hrtimer_cancel() (since it will
1555-
* deadlock) and don't call hrtimer_try_to_cancel() (since it will just
1556-
* return -1). Though callback_fn is still running on this cpu it's
1566+
/* We check that bpf_map_delete/update_elem() was called from timer
1567+
* callback_fn. In such case we don't call hrtimer_cancel() (since it
1568+
* will deadlock) and don't call hrtimer_try_to_cancel() (since it will
1569+
* just return -1). Though callback_fn is still running on this cpu it's
15571570
* safe to do kfree(t) because bpf_timer_cb() read everything it needed
15581571
* from 't'. The bpf subprog callback_fn won't be able to access 't',
15591572
* since async->cb = NULL was already done. The timer will be
15601573
* effectively cancelled because bpf_timer_cb() will return
15611574
* HRTIMER_NORESTART.
1575+
*
1576+
* However, it is possible the timer callback_fn calling us armed the
1577+
* timer _before_ calling us, such that failing to cancel it here will
1578+
* cause it to possibly use struct hrtimer after freeing bpf_hrtimer.
1579+
* Therefore, we _need_ to cancel any outstanding timers before we do
1580+
* kfree_rcu, even though no more timers can be armed.
1581+
*
1582+
* Moreover, we need to schedule work even if timer does not belong to
1583+
* the calling callback_fn, as on two different CPUs, we can end up in a
1584+
* situation where both sides run in parallel, try to cancel one
1585+
* another, and we end up waiting on both sides in hrtimer_cancel
1586+
* without making forward progress, since timer1 depends on time2
1587+
* callback to finish, and vice versa.
1588+
*
1589+
* CPU 1 (timer1_cb) CPU 2 (timer2_cb)
1590+
* bpf_timer_cancel_and_free(timer2) bpf_timer_cancel_and_free(timer1)
1591+
*
1592+
* To avoid these issues, punt to workqueue context when we are in a
1593+
* timer callback.
15621594
*/
1563-
if (this_cpu_read(hrtimer_running) != t)
1564-
hrtimer_cancel(&t->timer);
1565-
kfree_rcu(t, cb.rcu);
1595+
if (this_cpu_read(hrtimer_running))
1596+
queue_work(system_unbound_wq, &t->cb.delete_work);
1597+
else
1598+
bpf_timer_delete_work(&t->cb.delete_work);
15661599
}
15671600

15681601
/* This function is called by map_delete/update_elem for individual element and

0 commit comments

Comments
 (0)