Skip to content

Conversation

@blktests-ci
Copy link

@blktests-ci blktests-ci bot commented Jul 10, 2025

Pull request for series with
subject: ublk: speed up ublk server exit handling
version: 2
url: https://patchwork.kernel.org/project/linux-block/list/?series=978954

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 10, 2025

Upstream branch: 8c2e52e
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 10, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 7b53827 to 84eba76 Compare July 10, 2025 17:05
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 9a69d4f to e311dd9 Compare July 11, 2025 01:13
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 84eba76 to 24bce39 Compare July 11, 2025 01:15
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from e311dd9 to b6b569e Compare July 11, 2025 07:05
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: bc9ff19
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 24bce39 to 5608095 Compare July 11, 2025 07:07
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from b6b569e to ef2c9cd Compare July 11, 2025 17:45
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: 40f92e7
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 5608095 to 31e9cf9 Compare July 11, 2025 17:46
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from ef2c9cd to 198825c Compare July 11, 2025 23:25
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 11, 2025

Upstream branch: 40f92e7
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 31e9cf9 to 40f0fa7 Compare July 11, 2025 23:27
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 198825c to 341e7ed Compare July 14, 2025 02:27
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 341e7ed to 81f31a4 Compare July 23, 2025 02:05
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 23, 2025

Upstream branch: 89be9a8
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 40f0fa7 to 4307ece Compare July 23, 2025 02:17
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 81f31a4 to 87bbbbc Compare July 24, 2025 05:40
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 24, 2025

Upstream branch: 25fae0b
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

Recently, we've observed a few cases where a ublk server is able to
complete restart more quickly than the driver can process the exit of
the previous ublk server. The new ublk server comes up, attempts
recovery of the preexisting ublk devices, and observes them still in
state UBLK_S_DEV_LIVE. While this is possible due to the asynchronous
nature of io_uring cleanup and should therefore be handled properly in
the ublk server, it is still preferable to make ublk server exit
handling faster if possible, as we should strive for it to not be a
limiting factor in how fast a ublk server can restart and provide
service again.

Analysis of the issue showed that the vast majority of the time spent in
handling the ublk server exit was in calls to blk_mq_quiesce_queue,
which is essentially just a (relatively expensive) call to
synchronize_rcu. The ublk server exit path currently issues an
unnecessarily large number of calls to blk_mq_quiesce_queue, for two
reasons:

1. It tries to call blk_mq_quiesce_queue once per ublk_queue. However,
   blk_mq_quiesce_queue targets the request_queue of the underlying ublk
   device, of which there is only one. So the number of calls is larger
   than necessary by a factor of nr_hw_queues.
2. In practice, it calls blk_mq_quiesce_queue _more_ than once per
   ublk_queue. This is because of a data race where we read
   ubq->canceling without any locking when deciding if we should call
   ublk_start_cancel. It is thus possible for two calls to
   ublk_uring_cmd_cancel_fn against the same ublk_queue to both call
   ublk_start_cancel against the same ublk_queue.

Fix this by making the "canceling" flag a per-device state. This
actually matches the existing code better, as there are several places
where the flag is set or cleared for all queues simultaneously, and
there is the general expectation that cancellation corresponds with ublk
server exit. This per-device canceling flag is then checked under a
(new) lock (addressing the data race (2) above), and the queue is only
quiesced if it is cleared (addressing (1) above). The result is just one
call to blk_mq_quiesce_queue per ublk device.

To minimize the number of cache lines that are accessed in the hot path,
the per-queue canceling flag is kept. The values of the per-device
canceling flag and all per-queue canceling flags should always match.

In our setup, where one ublk server handles I/O for 128 ublk devices,
each having 24 hardware queues of depth 4096, here are the results
before and after this patch, where teardown time is measured from the
first call to io_ring_ctx_wait_and_kill to the return from the last
ublk_ch_release:

						before		after
number of calls to blk_mq_quiesce_queue:	6469		256
teardown time:					11.14s		2.44s

There are still some potential optimizations here, but this takes care
of a big chunk of the ublk server exit handling delay.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Caleb Sander Mateos <csander@purestorage.com>
For performance reasons (minimizing the number of cache lines accessed
in the hot path), we store the "canceling" state redundantly - there is
one flag in the device, which can be considered the source of truth, and
per-queue copies of that flag. This redundancy can cause confusion, and
opens the door to bugs where the state is set inconsistently. Try to
guard against these bugs by introducing a ublk_set_canceling helper
which is the sole mutator of both the per-device and per-queue canceling
state. This helper always sets the state consistently. Use the helper in
all places where we need to modify the canceling state.

No functional changes are expected.

Signed-off-by: Uday Shankar <ushankar@purestorage.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
@blktests-ci blktests-ci bot force-pushed the series/976489=>linus-master branch from 4307ece to 14b01e2 Compare July 24, 2025 11:33
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 87bbbbc to 6637119 Compare July 30, 2025 01:47
@blktests-ci blktests-ci bot force-pushed the linus-master_base branch from 6637119 to f092a9b Compare July 31, 2025 04:25
@blktests-ci
Copy link
Author

blktests-ci bot commented Jul 31, 2025

Upstream branch: 260f6f4
series: https://patchwork.kernel.org/project/linux-block/list/?series=978954
version: 2

Pull request is NOT updated. Failed to apply https://patchwork.kernel.org/project/linux-block/list/?series=978954
error message:

Cmd('git') failed due to: exit code(128)
  cmdline: git am --3way
  stdout: 'Applying: ublk: speed up ublk server exit handling
Using index info to reconstruct a base tree...
M	drivers/block/ublk_drv.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/block/ublk_drv.c
CONFLICT (content): Merge conflict in drivers/block/ublk_drv.c
Patch failed at 0001 ublk: speed up ublk server exit handling'
  stderr: 'error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"'

conflict:

diff --cc drivers/block/ublk_drv.c
index 6561d2a561fa,870d57a853a4..000000000000
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@@ -238,7 -237,6 +238,10 @@@ struct ublk_device 
  	unsigned int		nr_privileged_daemon;
  	struct mutex cancel_mutex;
  	bool canceling;
++<<<<<<< HEAD
 +	pid_t 	ublksrv_tgid;
++=======
++>>>>>>> ublk: speed up ublk server exit handling
  };
  
  /* header of ublk_params */
@@@ -1622,11 -1591,13 +1625,21 @@@ static int ublk_ch_release(struct inod
  	 * All requests may be inflight, so ->canceling may not be set, set
  	 * it now.
  	 */
++<<<<<<< HEAD
 +	mutex_lock(&ub->cancel_mutex);
 +	ublk_set_canceling(ub, true);
 +	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
 +		ublk_abort_queue(ub, ublk_get_queue(ub, i));
 +	mutex_unlock(&ub->cancel_mutex);
++=======
+ 	ub->canceling = true;
+ 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ 		struct ublk_queue *ubq = ublk_get_queue(ub, i);
+ 
+ 		ubq->canceling = true;
+ 		ublk_abort_queue(ub, ubq);
+ 	}
++>>>>>>> ublk: speed up ublk server exit handling
  	blk_mq_kick_requeue_list(disk->queue);
  
  	/*
@@@ -1768,7 -1740,9 +1782,13 @@@ static void ublk_start_cancel(struct ub
  	 * touch completed uring_cmd
  	 */
  	blk_mq_quiesce_queue(disk->queue);
++<<<<<<< HEAD
 +	ublk_set_canceling(ub, true);
++=======
+ 	ub->canceling = true;
+ 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ 		ublk_get_queue(ub, i)->canceling = true;
++>>>>>>> ublk: speed up ublk server exit handling
  	blk_mq_unquiesce_queue(disk->queue);
  out:
  	mutex_unlock(&ub->cancel_mutex);
@@@ -1968,11 -1942,10 +1988,15 @@@ static void ublk_reset_io_flags(struct 
  		for (j = 0; j < ubq->q_depth; j++)
  			ubq->ios[j].flags &= ~UBLK_IO_FLAG_CANCELED;
  		spin_unlock(&ubq->cancel_lock);
 -		ubq->canceling = false;
  		ubq->fail_io = false;
  	}
++<<<<<<< HEAD
 +	mutex_lock(&ub->cancel_mutex);
 +	ublk_set_canceling(ub, false);
 +	mutex_unlock(&ub->cancel_mutex);
++=======
+ 	ub->canceling = false;
++>>>>>>> ublk: speed up ublk server exit handling
  }
  
  /* device can only be started after all IOs are ready */
@@@ -3494,11 -3435,14 +3518,21 @@@ static int ublk_ctrl_quiesce_dev(struc
  		goto put_disk;
  
  	/* Mark the device as canceling */
++<<<<<<< HEAD
 +	mutex_lock(&ub->cancel_mutex);
 +	blk_mq_quiesce_queue(disk->queue);
 +	ublk_set_canceling(ub, true);
++=======
+ 	blk_mq_quiesce_queue(disk->queue);
+ 	ub->canceling = true;
+ 	for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ 		struct ublk_queue *ubq = ublk_get_queue(ub, i);
+ 
+ 		ubq->canceling = true;
+ 	}
++>>>>>>> ublk: speed up ublk server exit handling
  	blk_mq_unquiesce_queue(disk->queue);
 +	mutex_unlock(&ub->cancel_mutex);
  
  	if (!timeout_ms)
  		timeout_ms = UINT_MAX;

@blktests-ci blktests-ci bot force-pushed the linus-master_base branch 9 times, most recently from 3893da1 to aeddbbb Compare August 1, 2025 12:14
@kawasaki kawasaki closed this Aug 1, 2025
blktests-ci bot pushed a commit that referenced this pull request Aug 2, 2025
Without the change `perf `hangs up on charaster devices. On my system
it's enough to run system-wide sampler for a few seconds to get the
hangup:

    $ perf record -a -g --call-graph=dwarf
    $ perf report
    # hung

`strace` shows that hangup happens on reading on a character device
`/dev/dri/renderD128`

    $ strace -y -f -p 2780484
    strace: Process 2780484 attached
    pread64(101</dev/dri/renderD128>, strace: Process 2780484 detached

It's call trace descends into `elfutils`:

    $ gdb -p 2780484
    (gdb) bt
    #0  0x00007f5e508f04b7 in __libc_pread64 (fd=101, buf=0x7fff9df7edb0, count=0, offset=0)
        at ../sysdeps/unix/sysv/linux/pread64.c:25
    #1  0x00007f5e52b79515 in read_file () from /<<NIX>>/elfutils-0.192/lib/libelf.so.1
    #2  0x00007f5e52b25666 in libdw_open_elf () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #3  0x00007f5e52b25907 in __libdw_open_file () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #4  0x00007f5e52b120a9 in dwfl_report_elf@@ELFUTILS_0.156 ()
       from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #5  0x000000000068bf20 in __report_module (al=al@entry=0x7fff9df80010, ip=ip@entry=139803237033216, ui=ui@entry=0x5369b5e0)
        at util/dso.h:537
    #6  0x000000000068c3d1 in report_module (ip=139803237033216, ui=0x5369b5e0) at util/unwind-libdw.c:114
    #7  frame_callback (state=0x535aef10, arg=0x5369b5e0) at util/unwind-libdw.c:242
    #8  0x00007f5e52b261d3 in dwfl_thread_getframes () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #9  0x00007f5e52b25bdb in get_one_thread_cb () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #10 0x00007f5e52b25faa in dwfl_getthreads () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #11 0x00007f5e52b26514 in dwfl_getthread_frames () from /<<NIX>>/elfutils-0.192/lib/libdw.so.1
    #12 0x000000000068c6ce in unwind__get_entries (cb=cb@entry=0x5d4620 <unwind_entry>, arg=arg@entry=0x10cd5fa0,
        thread=thread@entry=0x1076a290, data=data@entry=0x7fff9df80540, max_stack=max_stack@entry=127,
        best_effort=best_effort@entry=false) at util/thread.h:152
    #13 0x00000000005dae95 in thread__resolve_callchain_unwind (evsel=0x106006d0, thread=0x1076a290, cursor=0x10cd5fa0,
        sample=0x7fff9df80540, max_stack=127, symbols=true) at util/machine.c:2939
    #14 thread__resolve_callchain_unwind (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, sample=0x7fff9df80540,
        max_stack=127, symbols=true) at util/machine.c:2920
    #15 __thread__resolve_callchain (thread=0x1076a290, cursor=0x10cd5fa0, evsel=0x106006d0, evsel@entry=0x7fff9df80440,
        sample=0x7fff9df80540, parent=parent@entry=0x7fff9df804a0, root_al=root_al@entry=0x7fff9df80440, max_stack=127, symbols=true)
        at util/machine.c:2970
    #16 0x00000000005d0cb2 in thread__resolve_callchain (thread=<optimized out>, cursor=<optimized out>, evsel=0x7fff9df80440,
        sample=<optimized out>, parent=0x7fff9df804a0, root_al=0x7fff9df80440, max_stack=127) at util/machine.h:198
    #17 sample__resolve_callchain (sample=<optimized out>, cursor=<optimized out>, parent=parent@entry=0x7fff9df804a0,
        evsel=evsel@entry=0x106006d0, al=al@entry=0x7fff9df80440, max_stack=max_stack@entry=127) at util/callchain.c:1127
    #18 0x0000000000617e08 in hist_entry_iter__add (iter=iter@entry=0x7fff9df80480, al=al@entry=0x7fff9df80440, max_stack_depth=127,
        arg=arg@entry=0x7fff9df81ae0) at util/hist.c:1255
    #19 0x000000000045d2d0 in process_sample_event (tool=0x7fff9df81ae0, event=<optimized out>, sample=0x7fff9df80540,
        evsel=0x106006d0, machine=<optimized out>) at builtin-report.c:334
    #20 0x00000000005e3bb1 in perf_session__deliver_event (session=0x105ff2c0, event=0x7f5c7d735ca0, tool=0x7fff9df81ae0,
        file_offset=2914716832, file_path=0x105ffbf0 "perf.data") at util/session.c:1367
    #21 0x00000000005e8d93 in do_flush (oe=0x105ffa50, show_progress=false) at util/ordered-events.c:245
    #22 __ordered_events__flush (oe=0x105ffa50, how=OE_FLUSH__ROUND, timestamp=<optimized out>) at util/ordered-events.c:324
    #23 0x00000000005e1f64 in perf_session__process_user_event (session=0x105ff2c0, event=0x7f5c7d752b18, file_offset=2914835224,
        file_path=0x105ffbf0 "perf.data") at util/session.c:1419
    #24 0x00000000005e47c7 in reader__read_event (rd=rd@entry=0x7fff9df81260, session=session@entry=0x105ff2c0,
    --Type <RET> for more, q to quit, c to continue without paging--
    quit
        prog=prog@entry=0x7fff9df81220) at util/session.c:2132
    #25 0x00000000005e4b37 in reader__process_events (rd=0x7fff9df81260, session=0x105ff2c0, prog=0x7fff9df81220)
        at util/session.c:2181
    #26 __perf_session__process_events (session=0x105ff2c0) at util/session.c:2226
    #27 perf_session__process_events (session=session@entry=0x105ff2c0) at util/session.c:2390
    #28 0x0000000000460add in __cmd_report (rep=0x7fff9df81ae0) at builtin-report.c:1076
    #29 cmd_report (argc=<optimized out>, argv=<optimized out>) at builtin-report.c:1827
    #30 0x00000000004c5a40 in run_builtin (p=p@entry=0xd8f7f8 <commands+312>, argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0)
        at perf.c:351
    #31 0x00000000004c5d63 in handle_internal_command (argc=argc@entry=1, argv=argv@entry=0x7fff9df844b0) at perf.c:404
    #32 0x0000000000442de3 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:448
    #33 main (argc=<optimized out>, argv=0x7fff9df844b0) at perf.c:556

The hangup happens because nothing in` perf` or `elfutils` checks if a
mapped file is easily readable.

The change conservatively skips all non-regular files.

Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20250505174419.2814857-1-slyich@gmail.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
blktests-ci bot pushed a commit that referenced this pull request Aug 2, 2025
Symbolize stack traces by creating a live machine. Add this
functionality to dump_stack and switch dump_stack users to use
it. Switch TUI to use it. Add stack traces to the child test function
which can be useful to diagnose blocked code.

Example output:
```
$ perf test -vv PERF_RECORD_
...
  7: PERF_RECORD_* events & perf_sample fields:
  7: PERF_RECORD_* events & perf_sample fields                       : Running (1 active)
^C
Signal (2) while running tests.
Terminating tests with the same signal
Internal test harness failure. Completing any started tests:
:  7: PERF_RECORD_* events & perf_sample fields:

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #3 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #4 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #5 0x7fc12ff02d68 in __sleep sleep.c:55
    #6 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #7 0x55788c620fb0 in run_test_child builtin-test.c:0
    #8 0x55788c5bd18d in start_command run-command.c:127
    #9 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #10 0x55788c6225bf in cmd_test ??:0
    #11 0x55788c5afbd0 in run_builtin perf.c:0
    #12 0x55788c5afeeb in handle_internal_command perf.c:0
    #13 0x55788c52b383 in main ??:0
    #14 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #15 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #16 0x55788c52b9d1 in _start ??:0

---- unexpected signal (2) ----
    #0 0x55788c6210a3 in child_test_sig_handler builtin-test.c:0
    #1 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #2 0x7fc12fea3a14 in pthread_sigmask@GLIBC_2.2.5 pthread_sigmask.c:45
    #3 0x7fc12fe49fd9 in __GI___sigprocmask sigprocmask.c:26
    #4 0x7fc12ff2601b in __longjmp_chk longjmp.c:36
    #5 0x55788c6210c0 in print_test_result.isra.0 builtin-test.c:0
    #6 0x7fc12fe49df0 in __restore_rt libc_sigaction.c:0
    #7 0x7fc12fe99687 in __internal_syscall_cancel cancellation.c:64
    #8 0x7fc12fee5f7a in clock_nanosleep@GLIBC_2.2.5 clock_nanosleep.c:72
    #9 0x7fc12fef1393 in __nanosleep nanosleep.c:26
    #10 0x7fc12ff02d68 in __sleep sleep.c:55
    #11 0x55788c63196b in test__PERF_RECORD perf-record.c:0
    #12 0x55788c620fb0 in run_test_child builtin-test.c:0
    #13 0x55788c5bd18d in start_command run-command.c:127
    #14 0x55788c621ef3 in __cmd_test builtin-test.c:0
    #15 0x55788c6225bf in cmd_test ??:0
    #16 0x55788c5afbd0 in run_builtin perf.c:0
    #17 0x55788c5afeeb in handle_internal_command perf.c:0
    #18 0x55788c52b383 in main ??:0
    #19 0x7fc12fe33ca8 in __libc_start_call_main libc_start_call_main.h:74
    #20 0x7fc12fe33d65 in __libc_start_main@@GLIBC_2.34 libc-start.c:128
    #21 0x55788c52b9d1 in _start ??:0
  7: PERF_RECORD_* events & perf_sample fields                       : Skip (permissions)
```

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20250624210500.2121303-1-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
@blktests-ci blktests-ci bot deleted the series/976489=>linus-master branch August 8, 2025 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants