Skip to content

Commit df668a5

Browse files
committed
Merge tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe: - disk events cleanup (Christoph) - gendisk and request queue allocation simplifications (Christoph) - bdev_disk_changed cleanups (Christoph) - IO priority improvements (Bart) - Chained bio completion trace fix (Edward) - blk-wbt fixes (Jan) - blk-wbt enable/disable fix (Zhang) - Scheduler dispatch improvements (Jan, Ming) - Shared tagset scheduler improvements (John) - BFQ updates (Paolo, Luca, Pietro) - BFQ lock inversion fix (Jan) - Documentation improvements (Kir) - CLONE_IO block cgroup fix (Tejun) - Remove of ancient and deprecated block dump feature (zhangyi) - Discard merge fix (Ming) - Misc fixes or followup fixes (Colin, Damien, Dan, Long, Max, Thomas, Yang) * tag 'for-5.14/block-2021-06-29' of git://git.kernel.dk/linux-block: (129 commits) block: fix discard request merge block/mq-deadline: Remove a WARN_ON_ONCE() call blk-mq: update hctx->dispatch_busy in case of real scheduler blk: Fix lock inversion between ioc lock and bfqd lock bfq: Remove merged request already in bfq_requests_merged() block: pass a gendisk to bdev_disk_changed block: move bdev_disk_changed block: add the events* attributes to disk_attrs block: move the disk events code to a separate file block: fix trace completion for chained bio block/partitions/msdos: Fix typo inidicator -> indicator block, bfq: reset waker pointer with shared queues block, bfq: check waker only for queues with no in-flight I/O block, bfq: avoid delayed merge of async queues block, bfq: boost throughput by extending queue-merging times block, bfq: consider also creation time in delayed stable merge block, bfq: fix delayed stable merge check block, bfq: let also stably merged queues enjoy weight raising blk-wbt: make sure throttle is enabled properly blk-wbt: introduce a new disable state to prevent false positive by rwb_enabled() ...
2 parents df04fbe + 2705dfb commit df668a5

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+3776
-3069
lines changed

Documentation/admin-guide/cgroup-v1/blkio-controller.rst

Lines changed: 80 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -17,36 +17,37 @@ level logical devices like device mapper.
1717

1818
HOWTO
1919
=====
20+
2021
Throttling/Upper Limit policy
2122
-----------------------------
22-
- Enable Block IO controller::
23+
Enable Block IO controller::
2324

2425
CONFIG_BLK_CGROUP=y
2526

26-
- Enable throttling in block layer::
27+
Enable throttling in block layer::
2728

2829
CONFIG_BLK_DEV_THROTTLING=y
2930

30-
- Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
31+
Mount blkio controller (see cgroups.txt, Why are cgroups needed?)::
3132

3233
mount -t cgroup -o blkio none /sys/fs/cgroup/blkio
3334

34-
- Specify a bandwidth rate on particular device for root group. The format
35-
for policy is "<major>:<minor> <bytes_per_second>"::
35+
Specify a bandwidth rate on particular device for root group. The format
36+
for policy is "<major>:<minor> <bytes_per_second>"::
3637

3738
echo "8:16 1048576" > /sys/fs/cgroup/blkio/blkio.throttle.read_bps_device
3839

39-
Above will put a limit of 1MB/second on reads happening for root group
40-
on device having major/minor number 8:16.
40+
This will put a limit of 1MB/second on reads happening for root group
41+
on device having major/minor number 8:16.
4142

42-
- Run dd to read a file and see if rate is throttled to 1MB/s or not::
43+
Run dd to read a file and see if rate is throttled to 1MB/s or not::
4344

4445
# dd iflag=direct if=/mnt/common/zerofile of=/dev/null bs=4K count=1024
4546
1024+0 records in
4647
1024+0 records out
4748
4194304 bytes (4.2 MB) copied, 4.0001 s, 1.0 MB/s
4849

49-
Limits for writes can be put using blkio.throttle.write_bps_device file.
50+
Limits for writes can be put using blkio.throttle.write_bps_device file.
5051

5152
Hierarchical Cgroups
5253
====================
@@ -79,85 +80,89 @@ following::
7980

8081
Various user visible config options
8182
===================================
82-
CONFIG_BLK_CGROUP
83-
- Block IO controller.
8483

85-
CONFIG_BFQ_CGROUP_DEBUG
86-
- Debug help. Right now some additional stats file show up in cgroup
84+
CONFIG_BLK_CGROUP
85+
Block IO controller.
86+
87+
CONFIG_BFQ_CGROUP_DEBUG
88+
Debug help. Right now some additional stats file show up in cgroup
8789
if this option is enabled.
8890

89-
CONFIG_BLK_DEV_THROTTLING
90-
- Enable block device throttling support in block layer.
91+
CONFIG_BLK_DEV_THROTTLING
92+
Enable block device throttling support in block layer.
9193

9294
Details of cgroup files
9395
=======================
96+
9497
Proportional weight policy files
9598
--------------------------------
96-
- blkio.weight
97-
- Specifies per cgroup weight. This is default weight of the group
98-
on all the devices until and unless overridden by per device rule.
99-
(See blkio.weight_device).
100-
Currently allowed range of weights is from 10 to 1000.
10199

102-
- blkio.weight_device
103-
- One can specify per cgroup per device rules using this interface.
104-
These rules override the default value of group weight as specified
105-
by blkio.weight.
100+
blkio.bfq.weight
101+
Specifies per cgroup weight. This is default weight of the group
102+
on all the devices until and unless overridden by per device rule
103+
(see `blkio.bfq.weight_device` below).
104+
105+
Currently allowed range of weights is from 1 to 1000. For more details,
106+
see Documentation/block/bfq-iosched.rst.
107+
108+
blkio.bfq.weight_device
109+
Specifes per cgroup per device weights, overriding the default group
110+
weight. For more details, see Documentation/block/bfq-iosched.rst.
106111

107112
Following is the format::
108113

109-
# echo dev_maj:dev_minor weight > blkio.weight_device
114+
# echo dev_maj:dev_minor weight > blkio.bfq.weight_device
110115

111116
Configure weight=300 on /dev/sdb (8:16) in this cgroup::
112117

113-
# echo 8:16 300 > blkio.weight_device
114-
# cat blkio.weight_device
118+
# echo 8:16 300 > blkio.bfq.weight_device
119+
# cat blkio.bfq.weight_device
115120
dev weight
116121
8:16 300
117122

118123
Configure weight=500 on /dev/sda (8:0) in this cgroup::
119124

120-
# echo 8:0 500 > blkio.weight_device
121-
# cat blkio.weight_device
125+
# echo 8:0 500 > blkio.bfq.weight_device
126+
# cat blkio.bfq.weight_device
122127
dev weight
123128
8:0 500
124129
8:16 300
125130

126131
Remove specific weight for /dev/sda in this cgroup::
127132

128-
# echo 8:0 0 > blkio.weight_device
129-
# cat blkio.weight_device
133+
# echo 8:0 0 > blkio.bfq.weight_device
134+
# cat blkio.bfq.weight_device
130135
dev weight
131136
8:16 300
132137

133-
- blkio.time
134-
- disk time allocated to cgroup per device in milliseconds. First
138+
blkio.time
139+
Disk time allocated to cgroup per device in milliseconds. First
135140
two fields specify the major and minor number of the device and
136141
third field specifies the disk time allocated to group in
137142
milliseconds.
138143

139-
- blkio.sectors
140-
- number of sectors transferred to/from disk by the group. First
144+
blkio.sectors
145+
Number of sectors transferred to/from disk by the group. First
141146
two fields specify the major and minor number of the device and
142147
third field specifies the number of sectors transferred by the
143148
group to/from the device.
144149

145-
- blkio.io_service_bytes
146-
- Number of bytes transferred to/from the disk by the group. These
150+
blkio.io_service_bytes
151+
Number of bytes transferred to/from the disk by the group. These
147152
are further divided by the type of operation - read or write, sync
148153
or async. First two fields specify the major and minor number of the
149154
device, third field specifies the operation type and the fourth field
150155
specifies the number of bytes.
151156

152-
- blkio.io_serviced
153-
- Number of IOs (bio) issued to the disk by the group. These
157+
blkio.io_serviced
158+
Number of IOs (bio) issued to the disk by the group. These
154159
are further divided by the type of operation - read or write, sync
155160
or async. First two fields specify the major and minor number of the
156161
device, third field specifies the operation type and the fourth field
157162
specifies the number of IOs.
158163

159-
- blkio.io_service_time
160-
- Total amount of time between request dispatch and request completion
164+
blkio.io_service_time
165+
Total amount of time between request dispatch and request completion
161166
for the IOs done by this cgroup. This is in nanoseconds to make it
162167
meaningful for flash devices too. For devices with queue depth of 1,
163168
this time represents the actual service time. When queue_depth > 1,
@@ -170,8 +175,8 @@ Proportional weight policy files
170175
specifies the operation type and the fourth field specifies the
171176
io_service_time in ns.
172177

173-
- blkio.io_wait_time
174-
- Total amount of time the IOs for this cgroup spent waiting in the
178+
blkio.io_wait_time
179+
Total amount of time the IOs for this cgroup spent waiting in the
175180
scheduler queues for service. This can be greater than the total time
176181
elapsed since it is cumulative io_wait_time for all IOs. It is not a
177182
measure of total time the cgroup spent waiting but rather a measure of
@@ -185,24 +190,24 @@ Proportional weight policy files
185190
minor number of the device, third field specifies the operation type
186191
and the fourth field specifies the io_wait_time in ns.
187192

188-
- blkio.io_merged
189-
- Total number of bios/requests merged into requests belonging to this
193+
blkio.io_merged
194+
Total number of bios/requests merged into requests belonging to this
190195
cgroup. This is further divided by the type of operation - read or
191196
write, sync or async.
192197

193-
- blkio.io_queued
194-
- Total number of requests queued up at any given instant for this
198+
blkio.io_queued
199+
Total number of requests queued up at any given instant for this
195200
cgroup. This is further divided by the type of operation - read or
196201
write, sync or async.
197202

198-
- blkio.avg_queue_size
199-
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
203+
blkio.avg_queue_size
204+
Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
200205
The average queue size for this cgroup over the entire time of this
201206
cgroup's existence. Queue size samples are taken each time one of the
202207
queues of this cgroup gets a timeslice.
203208

204-
- blkio.group_wait_time
205-
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
209+
blkio.group_wait_time
210+
Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
206211
This is the amount of time the cgroup had to wait since it became busy
207212
(i.e., went from 0 to 1 request queued) to get a timeslice for one of
208213
its queues. This is different from the io_wait_time which is the
@@ -212,85 +217,85 @@ Proportional weight policy files
212217
will only report the group_wait_time accumulated till the last time it
213218
got a timeslice and will not include the current delta.
214219

215-
- blkio.empty_time
216-
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
220+
blkio.empty_time
221+
Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
217222
This is the amount of time a cgroup spends without any pending
218223
requests when not being served, i.e., it does not include any time
219224
spent idling for one of the queues of the cgroup. This is in
220225
nanoseconds. If this is read when the cgroup is in an empty state,
221226
the stat will only report the empty_time accumulated till the last
222227
time it had a pending request and will not include the current delta.
223228

224-
- blkio.idle_time
225-
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
229+
blkio.idle_time
230+
Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y.
226231
This is the amount of time spent by the IO scheduler idling for a
227232
given cgroup in anticipation of a better request than the existing ones
228233
from other queues/cgroups. This is in nanoseconds. If this is read
229234
when the cgroup is in an idling state, the stat will only report the
230235
idle_time accumulated till the last idle period and will not include
231236
the current delta.
232237

233-
- blkio.dequeue
234-
- Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
238+
blkio.dequeue
239+
Debugging aid only enabled if CONFIG_BFQ_CGROUP_DEBUG=y. This
235240
gives the statistics about how many a times a group was dequeued
236241
from service tree of the device. First two fields specify the major
237242
and minor number of the device and third field specifies the number
238243
of times a group was dequeued from a particular device.
239244

240-
- blkio.*_recursive
241-
- Recursive version of various stats. These files show the
245+
blkio.*_recursive
246+
Recursive version of various stats. These files show the
242247
same information as their non-recursive counterparts but
243248
include stats from all the descendant cgroups.
244249

245250
Throttling/Upper limit policy files
246251
-----------------------------------
247-
- blkio.throttle.read_bps_device
248-
- Specifies upper limit on READ rate from the device. IO rate is
252+
blkio.throttle.read_bps_device
253+
Specifies upper limit on READ rate from the device. IO rate is
249254
specified in bytes per second. Rules are per device. Following is
250255
the format::
251256

252257
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.read_bps_device
253258

254-
- blkio.throttle.write_bps_device
255-
- Specifies upper limit on WRITE rate to the device. IO rate is
259+
blkio.throttle.write_bps_device
260+
Specifies upper limit on WRITE rate to the device. IO rate is
256261
specified in bytes per second. Rules are per device. Following is
257262
the format::
258263

259264
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.throttle.write_bps_device
260265

261-
- blkio.throttle.read_iops_device
262-
- Specifies upper limit on READ rate from the device. IO rate is
266+
blkio.throttle.read_iops_device
267+
Specifies upper limit on READ rate from the device. IO rate is
263268
specified in IO per second. Rules are per device. Following is
264269
the format::
265270

266271
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.read_iops_device
267272

268-
- blkio.throttle.write_iops_device
269-
- Specifies upper limit on WRITE rate to the device. IO rate is
273+
blkio.throttle.write_iops_device
274+
Specifies upper limit on WRITE rate to the device. IO rate is
270275
specified in io per second. Rules are per device. Following is
271276
the format::
272277

273278
echo "<major>:<minor> <rate_io_per_second>" > /cgrp/blkio.throttle.write_iops_device
274279

275-
Note: If both BW and IOPS rules are specified for a device, then IO is
276-
subjected to both the constraints.
280+
Note: If both BW and IOPS rules are specified for a device, then IO is
281+
subjected to both the constraints.
277282

278-
- blkio.throttle.io_serviced
279-
- Number of IOs (bio) issued to the disk by the group. These
283+
blkio.throttle.io_serviced
284+
Number of IOs (bio) issued to the disk by the group. These
280285
are further divided by the type of operation - read or write, sync
281286
or async. First two fields specify the major and minor number of the
282287
device, third field specifies the operation type and the fourth field
283288
specifies the number of IOs.
284289

285-
- blkio.throttle.io_service_bytes
286-
- Number of bytes transferred to/from the disk by the group. These
290+
blkio.throttle.io_service_bytes
291+
Number of bytes transferred to/from the disk by the group. These
287292
are further divided by the type of operation - read or write, sync
288293
or async. First two fields specify the major and minor number of the
289294
device, third field specifies the operation type and the fourth field
290295
specifies the number of bytes.
291296

292297
Common files among various policies
293298
-----------------------------------
294-
- blkio.reset_stats
295-
- Writing an int to this file will result in resetting all the stats
299+
blkio.reset_stats
300+
Writing an int to this file will result in resetting all the stats
296301
for that cgroup.

Documentation/admin-guide/cgroup-v2.rst

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgrou
5656
5-3-3. IO Latency
5757
5-3-3-1. How IO Latency Throttling Works
5858
5-3-3-2. IO Latency Interface Files
59+
5-3-4. IO Priority
5960
5-4. PID
6061
5-4-1. PID Interface Files
6162
5-5. Cpuset
@@ -1866,6 +1867,60 @@ IO Latency Interface Files
18661867
duration of time between evaluation events. Windows only elapse
18671868
with IO activity. Idle periods extend the most recent window.
18681869

1870+
IO Priority
1871+
~~~~~~~~~~~
1872+
1873+
A single attribute controls the behavior of the I/O priority cgroup policy,
1874+
namely the blkio.prio.class attribute. The following values are accepted for
1875+
that attribute:
1876+
1877+
no-change
1878+
Do not modify the I/O priority class.
1879+
1880+
none-to-rt
1881+
For requests that do not have an I/O priority class (NONE),
1882+
change the I/O priority class into RT. Do not modify
1883+
the I/O priority class of other requests.
1884+
1885+
restrict-to-be
1886+
For requests that do not have an I/O priority class or that have I/O
1887+
priority class RT, change it into BE. Do not modify the I/O priority
1888+
class of requests that have priority class IDLE.
1889+
1890+
idle
1891+
Change the I/O priority class of all requests into IDLE, the lowest
1892+
I/O priority class.
1893+
1894+
The following numerical values are associated with the I/O priority policies:
1895+
1896+
+-------------+---+
1897+
| no-change | 0 |
1898+
+-------------+---+
1899+
| none-to-rt | 1 |
1900+
+-------------+---+
1901+
| rt-to-be | 2 |
1902+
+-------------+---+
1903+
| all-to-idle | 3 |
1904+
+-------------+---+
1905+
1906+
The numerical value that corresponds to each I/O priority class is as follows:
1907+
1908+
+-------------------------------+---+
1909+
| IOPRIO_CLASS_NONE | 0 |
1910+
+-------------------------------+---+
1911+
| IOPRIO_CLASS_RT (real-time) | 1 |
1912+
+-------------------------------+---+
1913+
| IOPRIO_CLASS_BE (best effort) | 2 |
1914+
+-------------------------------+---+
1915+
| IOPRIO_CLASS_IDLE | 3 |
1916+
+-------------------------------+---+
1917+
1918+
The algorithm to set the I/O priority class for a request is as follows:
1919+
1920+
- Translate the I/O priority class policy into a number.
1921+
- Change the request I/O priority class into the maximum of the I/O priority
1922+
class policy number and the numerical I/O priority class.
1923+
18691924
PID
18701925
---
18711926

0 commit comments

Comments
 (0)