Skip to content

Commit 058a0ee

Browse files
authored
Merge pull request ceph#46270 from anthonyeleven/anthonyeleven/clarify-min-alloc-size
2 parents cd0cd7c + 7a2a565 commit 058a0ee

File tree

1 file changed

+31
-10
lines changed

1 file changed

+31
-10
lines changed

doc/rados/configuration/bluestore-config-ref.rst

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -395,16 +395,13 @@ is created on an HDD, BlueStore will be initialized with the current value
395395
of :confval:`bluestore_min_alloc_size_hdd`, and SSD OSDs (including NVMe devices)
396396
with the value of :confval:`bluestore_min_alloc_size_ssd`.
397397

398-
Note that this BlueStore attribute takes effect *only* at OSD creation; if
399-
changed later, a given OSD's behavior will not change unless / until it is
400-
destroyed and redeployed.
401-
402398
Through the Mimic release, the default values were 64KB and 16KB for rotational
403-
(HDD) and non-rotational (SSD) media respectively. Octopus and later releases
404-
default to a value of 4KB for all media types.
399+
(HDD) and non-rotational (SSD) media respectively. Octopus changed the default
400+
for SSD (non-rotational) media to 4KB, and Pacific changed the default for HDD
401+
(rotational) media to 4KB as well.
405402

406-
This change was driven by the space amplification experienced by Ceph RADOS
407-
GateWay (RGW) deployments that host large numbers of relatively small files
403+
These changes were driven by space amplification experienced by Ceph RADOS
404+
GateWay (RGW) deployments that host large numbers of small files
408405
(S3/Swift objects).
409406

410407
For example, when an RGW client stores a 1KB S3 object, it is written to a
@@ -446,12 +443,36 @@ the :confval:`bluestore_use_optimal_io_size_for_min_alloc_size`
446443
option that enables automatic discovery of the appropriate value as each OSD is
447444
created. Note that the use of ``bcache``, ``OpenCAS``, ``dmcrypt``,
448445
``ATA over Ethernet``, `iSCSI`, or other device layering / abstraction
449-
technologies may confound the determination of appropriate values. We suggest
450-
inspecting such OSDs at startup via logs and admin sockets to ensure that
446+
technologies may confound the determination of appropriate values. OSD devices
447+
deployed on top of VMware VSAN virtual volumes have been reported to also
448+
sometimes report a ``rotational`` attribute that does not match the underlying
449+
hardware.
450+
451+
We suggest inspecting such OSDs at startup via logs and admin sockets to ensure that
451452
behavior is appropriate. Note that this also may not work as desired with
452453
older kernels. You can check for this by examining the presence and value
453454
of ``/sys/block/<drive>/queue/optimal_io_size``.
454455

456+
You may also inspect a given OSD:
457+
458+
.. prompt:: bash #
459+
460+
ceph osd metadata osd.1701 | grep rotational
461+
462+
This space amplification may manifest as an unusually high ratio of raw to
463+
stored data reported by ``ceph df``. ``ceph osd df`` may also report
464+
anomalously high ``%USE`` / ``VAR`` values when
465+
compared to other, ostensibly identical OSDs. A pool using OSDs with
466+
mismatched ``min_alloc_size`` values may experience unexpected balancer
467+
behavior as well.
468+
469+
Note that this BlueStore attribute takes effect *only* at OSD creation; if
470+
changed later, a given OSD's behavior will not change unless / until it is
471+
destroyed and redeployed with the appropriate option value(s). Upgrading
472+
to a later Ceph release will *not* change the value used by OSDs deployed
473+
under older releases or with other settings.
474+
475+
455476
.. confval:: bluestore_min_alloc_size
456477
.. confval:: bluestore_min_alloc_size_hdd
457478
.. confval:: bluestore_min_alloc_size_ssd

0 commit comments

Comments
 (0)