Skip to content

Commit 0453aad

Browse files
isilenceaxboe
authored andcommitted
io_uring/io-wq: limit retrying worker initialisation
If io-wq worker creation fails, we retry it by queueing up a task_work. tasK_work is needed because it should be done from the user process context. The problem is that retries are not limited, and if queueing a task_work is the reason for the failure, we might get into an infinite loop. It doesn't seem to happen now but it would with the following patch executing task_work in the freezer's loop. For now, arbitrarily limit the number of attempts to create a worker. Cc: [email protected] Fixes: 3146cba ("io-wq: make worker creation resilient against signals") Reported-by: Julian Orth <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/8280436925db88448c7c85c6656edee1a43029ea.1720634146.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
1 parent f7c696a commit 0453aad

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

io_uring/io-wq.c

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
#include "io_uring.h"
2424

2525
#define WORKER_IDLE_TIMEOUT (5 * HZ)
26+
#define WORKER_INIT_LIMIT 3
2627

2728
enum {
2829
IO_WORKER_F_UP = 0, /* up and active */
@@ -58,6 +59,7 @@ struct io_worker {
5859

5960
unsigned long create_state;
6061
struct callback_head create_work;
62+
int init_retries;
6163

6264
union {
6365
struct rcu_head rcu;
@@ -745,14 +747,16 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
745747
return true;
746748
}
747749

748-
static inline bool io_should_retry_thread(long err)
750+
static inline bool io_should_retry_thread(struct io_worker *worker, long err)
749751
{
750752
/*
751753
* Prevent perpetual task_work retry, if the task (or its group) is
752754
* exiting.
753755
*/
754756
if (fatal_signal_pending(current))
755757
return false;
758+
if (worker->init_retries++ >= WORKER_INIT_LIMIT)
759+
return false;
756760

757761
switch (err) {
758762
case -EAGAIN:
@@ -779,7 +783,7 @@ static void create_worker_cont(struct callback_head *cb)
779783
io_init_new_worker(wq, worker, tsk);
780784
io_worker_release(worker);
781785
return;
782-
} else if (!io_should_retry_thread(PTR_ERR(tsk))) {
786+
} else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
783787
struct io_wq_acct *acct = io_wq_get_acct(worker);
784788

785789
atomic_dec(&acct->nr_running);
@@ -846,7 +850,7 @@ static bool create_io_worker(struct io_wq *wq, int index)
846850
tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE);
847851
if (!IS_ERR(tsk)) {
848852
io_init_new_worker(wq, worker, tsk);
849-
} else if (!io_should_retry_thread(PTR_ERR(tsk))) {
853+
} else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
850854
kfree(worker);
851855
goto fail;
852856
} else {

0 commit comments

Comments
 (0)