Skip to content

Commit 5b0ed59

Browse files
committed
Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe: - NVMe updates via Christoph: - Small improvements to the logging functionality (Amit Engel) - Authentication cleanups (Hannes Reinecke) - Cleanup and optimize the DMA mapping cod in the PCIe driver (Keith Busch) - Work around the command effects for Format NVM (Keith Busch) - Misc cleanups (Keith Busch, Christoph Hellwig) - Fix and cleanup freeing single sgl (Keith Busch) - MD updates via Song: - Fix a rare crash during the takeover process - Don't update recovery_cp when curr_resync is ACTIVE - Free writes_pending in md_stop - Change active_io to percpu - Updates to drbd, inching us closer to unifying the out-of-tree driver with the in-tree one (Andreas, Christoph, Lars, Robert) - BFQ update adding support for multi-actuator drives (Paolo, Federico, Davide) - Make brd compliant with REQ_NOWAIT (me) - Fix for IOPOLL and queue entering, fixing stalled IO waiting on timeouts (me) - Fix for REQ_NOWAIT with multiple bios (me) - Fix memory leak in blktrace cleanup (Greg) - Clean up sbitmap and fix a potential hang (Kemeng) - Clean up some bits in BFQ, and fix a bug in the request injection (Kemeng) - Clean up the request allocation and issue code, and fix some bugs related to that (Kemeng) - ublk updates and fixes: - Add support for unprivileged ublk (Ming) - Improve device deletion handling (Ming) - Misc (Liu, Ziyang) - s390 dasd fixes (Alexander, Qiheng) - Improve utility of request caching and fixes (Anuj, Xiao) - zoned cleanups (Pankaj) - More constification for kobjs (Thomas) - blk-iocost cleanups (Yu) - Remove bio splitting from drivers that don't need it (Christoph) - Switch blk-cgroups to use struct gendisk. Some of this is now incomplete as select late reverts were done. (Christoph) - Add bvec initialization helpers, and convert callers to use that rather than open-coding it (Christoph) - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin, Matthew, Ulf, Zhong) * tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits) brd: use radix_tree_maybe_preload instead of radix_tree_preload block: use proper return value from bio_failfast() block: bio-integrity: Copy flags when bio_integrity_payload is cloned block: Fix io statistics for cgroup in throttle path brd: mark as nowait compatible brd: check for REQ_NOWAIT and set correct page allocation mask brd: return 0/-error from brd_insert_page() block: sync mixed merged request's failfast with 1st bio's Revert "blk-cgroup: pin the gendisk in struct blkcg_gq" Revert "blk-cgroup: pass a gendisk to blkg_lookup" Revert "blk-cgroup: delay blk-cgroup initialization until add_disk" Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release" Revert "blk-cgroup: move the cgroup information to struct gendisk" nvme-pci: remove iod use_sgls nvme-pci: fix freeing single sgl block: ublk: check IO buffer based on flag need_get_data s390/dasd: Fix potential memleak in dasd_eckd_init() s390/dasd: sort out physical vs virtual pointers usage block: Remove the ALLOC_CACHE_SLACK constant block: make kobj_type structures constant ...
2 parents 553637f + 0aa2988 commit 5b0ed59

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

109 files changed

+2224
-1599
lines changed

Documentation/ABI/stable/sysfs-block

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -432,7 +432,8 @@ Contact: [email protected]
432432
Description:
433433
[RW] This is the maximum number of kilobytes that the block
434434
layer will allow for a filesystem request. Must be smaller than
435-
or equal to the maximum size allowed by the hardware.
435+
or equal to the maximum size allowed by the hardware. Write 0
436+
to use default kernel settings.
436437

437438

438439
What: /sys/block/<disk>/queue/max_segment_size

Documentation/block/capability.rst

Lines changed: 0 additions & 10 deletions
This file was deleted.

Documentation/block/index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,6 @@ Block
1010
bfq-iosched
1111
biovecs
1212
blk-mq
13-
capability
1413
cmdline-partition
1514
data-integrity
1615
deadline-iosched

Documentation/block/ublk.rst

Lines changed: 46 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,43 @@ managing and controlling ublk devices with help of several control commands:
144144
For retrieving device info via ``ublksrv_ctrl_dev_info``. It is the server's
145145
responsibility to save IO target specific info in userspace.
146146

147+
- ``UBLK_CMD_GET_DEV_INFO2``
148+
Same purpose with ``UBLK_CMD_GET_DEV_INFO``, but ublk server has to
149+
provide path of the char device of ``/dev/ublkc*`` for kernel to run
150+
permission check, and this command is added for supporting unprivileged
151+
ublk device, and introduced with ``UBLK_F_UNPRIVILEGED_DEV`` together.
152+
Only the user owning the requested device can retrieve the device info.
153+
154+
How to deal with userspace/kernel compatibility:
155+
156+
1) if kernel is capable of handling ``UBLK_F_UNPRIVILEGED_DEV``
157+
158+
If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``:
159+
160+
ublk server should send ``UBLK_CMD_GET_DEV_INFO2``, given anytime
161+
unprivileged application needs to query devices the current user owns,
162+
when the application has no idea if ``UBLK_F_UNPRIVILEGED_DEV`` is set
163+
given the capability info is stateless, and application should always
164+
retrieve it via ``UBLK_CMD_GET_DEV_INFO2``
165+
166+
If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``:
167+
168+
``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of
169+
UBLK_F_UNPRIVILEGED_DEV isn't available for user
170+
171+
2) if kernel isn't capable of handling ``UBLK_F_UNPRIVILEGED_DEV``
172+
173+
If ublk server supports ``UBLK_F_UNPRIVILEGED_DEV``:
174+
175+
``UBLK_CMD_GET_DEV_INFO2`` is tried first, and will be failed, then
176+
``UBLK_CMD_GET_DEV_INFO`` needs to be retried given
177+
``UBLK_F_UNPRIVILEGED_DEV`` can't be set
178+
179+
If ublk server doesn't support ``UBLK_F_UNPRIVILEGED_DEV``:
180+
181+
``UBLK_CMD_GET_DEV_INFO`` is always sent to kernel, and the feature of
182+
``UBLK_F_UNPRIVILEGED_DEV`` isn't available for user
183+
147184
- ``UBLK_CMD_START_USER_RECOVERY``
148185

149186
This command is valid if ``UBLK_F_USER_RECOVERY`` feature is enabled. This
@@ -180,6 +217,15 @@ managing and controlling ublk devices with help of several control commands:
180217
double-write since the driver may issue the same I/O request twice. It
181218
might be useful to a read-only FS or a VM backend.
182219

220+
Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
221+
Once the flag is set, all control commands can be sent by unprivileged
222+
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on
223+
the specified char device(``/dev/ublkc*``) is done for all other control
224+
commands by ublk driver, for doing that, path of the char device has to
225+
be provided in these commands' payload from ublk server. With this way,
226+
ublk device becomes container-ware, and device created in one container
227+
can be controlled/accessed just inside this container.
228+
183229
Data plane
184230
----------
185231

@@ -254,15 +300,6 @@ with specified IO tag in the command data:
254300
Future development
255301
==================
256302

257-
Container-aware ublk deivice
258-
----------------------------
259-
260-
ublk driver doesn't handle any IO logic. Its function is well defined
261-
for now and very limited userspace interfaces are needed, which is also
262-
well defined too. It is possible to make ublk devices container-aware block
263-
devices in future as Stefan Hajnoczi suggested [#stefan]_, by removing
264-
ADMIN privilege.
265-
266303
Zero copy
267304
---------
268305

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6425,6 +6425,7 @@ T: git git://git.linbit.com/linux-drbd.git
64256425
T: git git://git.linbit.com/drbd-8.4.git
64266426
F: Documentation/admin-guide/blockdev/
64276427
F: drivers/block/drbd/
6428+
F: include/linux/drbd*
64286429
F: lib/lru_cache.c
64296430

64306431
DRIVER COMPONENT FRAMEWORK

block/Kconfig.iosched

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ config IOSCHED_BFQ
3030
config BFQ_GROUP_IOSCHED
3131
bool "BFQ hierarchical scheduling support"
3232
depends on IOSCHED_BFQ && BLK_CGROUP
33+
default y
3334
select BLK_CGROUP_RWSTAT
3435
help
3536

block/bfq-cgroup.c

Lines changed: 55 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -513,12 +513,12 @@ static void bfq_cpd_free(struct blkcg_policy_data *cpd)
513513
kfree(cpd_to_bfqgd(cpd));
514514
}
515515

516-
static struct blkg_policy_data *bfq_pd_alloc(gfp_t gfp, struct request_queue *q,
517-
struct blkcg *blkcg)
516+
static struct blkg_policy_data *bfq_pd_alloc(struct gendisk *disk,
517+
struct blkcg *blkcg, gfp_t gfp)
518518
{
519519
struct bfq_group *bfqg;
520520

521-
bfqg = kzalloc_node(sizeof(*bfqg), gfp, q->node);
521+
bfqg = kzalloc_node(sizeof(*bfqg), gfp, disk->node_id);
522522
if (!bfqg)
523523
return NULL;
524524

@@ -551,7 +551,6 @@ static void bfq_pd_init(struct blkg_policy_data *pd)
551551
bfqg->bfqd = bfqd;
552552
bfqg->active_entities = 0;
553553
bfqg->num_queues_with_pending_reqs = 0;
554-
bfqg->online = true;
555554
bfqg->rq_pos_tree = RB_ROOT;
556555
}
557556

@@ -614,7 +613,7 @@ struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
614613
continue;
615614
}
616615
bfqg = blkg_to_bfqg(blkg);
617-
if (bfqg->online) {
616+
if (bfqg->pd.online) {
618617
bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
619618
return bfqg;
620619
}
@@ -706,12 +705,52 @@ void bfq_bfqq_move(struct bfq_data *bfqd, struct bfq_queue *bfqq,
706705
bfq_activate_bfqq(bfqd, bfqq);
707706
}
708707

709-
if (!bfqd->in_service_queue && !bfqd->rq_in_driver)
708+
if (!bfqd->in_service_queue && !bfqd->tot_rq_in_driver)
710709
bfq_schedule_dispatch(bfqd);
711710
/* release extra ref taken above, bfqq may happen to be freed now */
712711
bfq_put_queue(bfqq);
713712
}
714713

714+
static void bfq_sync_bfqq_move(struct bfq_data *bfqd,
715+
struct bfq_queue *sync_bfqq,
716+
struct bfq_io_cq *bic,
717+
struct bfq_group *bfqg,
718+
unsigned int act_idx)
719+
{
720+
struct bfq_queue *bfqq;
721+
722+
if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
723+
/* We are the only user of this bfqq, just move it */
724+
if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
725+
bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
726+
return;
727+
}
728+
729+
/*
730+
* The queue was merged to a different queue. Check
731+
* that the merge chain still belongs to the same
732+
* cgroup.
733+
*/
734+
for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
735+
if (bfqq->entity.sched_data != &bfqg->sched_data)
736+
break;
737+
if (bfqq) {
738+
/*
739+
* Some queue changed cgroup so the merge is not valid
740+
* anymore. We cannot easily just cancel the merge (by
741+
* clearing new_bfqq) as there may be other processes
742+
* using this queue and holding refs to all queues
743+
* below sync_bfqq->new_bfqq. Similarly if the merge
744+
* already happened, we need to detach from bfqq now
745+
* so that we cannot merge bio to a request from the
746+
* old cgroup.
747+
*/
748+
bfq_put_cooperator(sync_bfqq);
749+
bic_set_bfqq(bic, NULL, true, act_idx);
750+
bfq_release_process_ref(bfqd, sync_bfqq);
751+
}
752+
}
753+
715754
/**
716755
* __bfq_bic_change_cgroup - move @bic to @bfqg.
717756
* @bfqd: the queue descriptor.
@@ -726,53 +765,20 @@ static void __bfq_bic_change_cgroup(struct bfq_data *bfqd,
726765
struct bfq_io_cq *bic,
727766
struct bfq_group *bfqg)
728767
{
729-
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false);
730-
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true);
731-
struct bfq_entity *entity;
768+
unsigned int act_idx;
732769

733-
if (async_bfqq) {
734-
entity = &async_bfqq->entity;
770+
for (act_idx = 0; act_idx < bfqd->num_actuators; act_idx++) {
771+
struct bfq_queue *async_bfqq = bic_to_bfqq(bic, false, act_idx);
772+
struct bfq_queue *sync_bfqq = bic_to_bfqq(bic, true, act_idx);
735773

736-
if (entity->sched_data != &bfqg->sched_data) {
737-
bic_set_bfqq(bic, NULL, false);
774+
if (async_bfqq &&
775+
async_bfqq->entity.sched_data != &bfqg->sched_data) {
776+
bic_set_bfqq(bic, NULL, false, act_idx);
738777
bfq_release_process_ref(bfqd, async_bfqq);
739778
}
740-
}
741779

742-
if (sync_bfqq) {
743-
if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
744-
/* We are the only user of this bfqq, just move it */
745-
if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
746-
bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
747-
} else {
748-
struct bfq_queue *bfqq;
749-
750-
/*
751-
* The queue was merged to a different queue. Check
752-
* that the merge chain still belongs to the same
753-
* cgroup.
754-
*/
755-
for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
756-
if (bfqq->entity.sched_data !=
757-
&bfqg->sched_data)
758-
break;
759-
if (bfqq) {
760-
/*
761-
* Some queue changed cgroup so the merge is
762-
* not valid anymore. We cannot easily just
763-
* cancel the merge (by clearing new_bfqq) as
764-
* there may be other processes using this
765-
* queue and holding refs to all queues below
766-
* sync_bfqq->new_bfqq. Similarly if the merge
767-
* already happened, we need to detach from
768-
* bfqq now so that we cannot merge bio to a
769-
* request from the old cgroup.
770-
*/
771-
bfq_put_cooperator(sync_bfqq);
772-
bic_set_bfqq(bic, NULL, true);
773-
bfq_release_process_ref(bfqd, sync_bfqq);
774-
}
775-
}
780+
if (sync_bfqq)
781+
bfq_sync_bfqq_move(bfqd, sync_bfqq, bic, bfqg, act_idx);
776782
}
777783
}
778784

@@ -978,7 +984,6 @@ static void bfq_pd_offline(struct blkg_policy_data *pd)
978984

979985
put_async_queues:
980986
bfq_put_async_queues(bfqd, bfqg);
981-
bfqg->online = false;
982987

983988
spin_unlock_irqrestore(&bfqd->lock, flags);
984989
/*
@@ -1284,7 +1289,7 @@ struct bfq_group *bfq_create_group_hierarchy(struct bfq_data *bfqd, int node)
12841289
{
12851290
int ret;
12861291

1287-
ret = blkcg_activate_policy(bfqd->queue, &blkcg_policy_bfq);
1292+
ret = blkcg_activate_policy(bfqd->queue->disk, &blkcg_policy_bfq);
12881293
if (ret)
12891294
return NULL;
12901295

0 commit comments

Comments
 (0)