Skip to content

Commit ce3d603

Browse files
authored
Merge pull request ceph#58904 from zdover23/wip-doc-2024-07-28-cephfs-multimds
doc/cephfs: edit "Dynamic Subtree Partitioning" Reviewed-by: Anthony D'Atri <[email protected]>
2 parents 2fa0e43 + d14119e commit ce3d603

File tree

1 file changed

+26
-23
lines changed

1 file changed

+26
-23
lines changed

doc/cephfs/multimds.rst

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -248,42 +248,45 @@ Dynamic Subtree Partitioning
248248

249249
CephFS has long had a dynamic metadata balancer (sometimes called the "default
250250
balancer") which can split or merge subtrees while placing them on "colder" MDS
251-
ranks. Moving the metadata around can improve overall file system throughput
251+
ranks. Moving the metadata in this way improves overall file system throughput
252252
and cache size.
253253

254-
However, the balancer has suffered from problem with efficiency and performance
255-
so it is by default turned off. This is to avoid an administrator "turning on
256-
multimds" by increasing the ``max_mds`` setting and then finding the balancer
257-
has made a mess of the cluster performance (reverting is straightforward but
258-
can take time).
254+
However, the balancer is sometimes inefficient or slow, so by default it is
255+
turned off. This is to avoid an administrator "turning on multimds" by
256+
increasing the ``max_mds`` setting only to find that the balancer has made a
257+
mess of the cluster performance (reverting from this messy state of affairs is
258+
straightforward but can take time).
259259

260-
The setting to turn on the balancer is:
260+
To turn on the balancer, run a command of the following form:
261261

262262
.. prompt:: bash #
263263

264264
ceph fs set <fs_name> balance_automate true
265265

266-
Turning on the balancer should only be done with appropriate configuration,
267-
such as with the ``bal_rank_mask`` setting (described below). Careful
268-
monitoring of the file system performance and MDS is advised.
266+
Turn on the balancer only with an appropriate configuration, such as a
267+
configuration that includes the ``bal_rank_mask`` setting (described
268+
:ref:`below <bal-rank-mask>`).
269+
270+
Careful monitoring of the file system performance and MDS is advised.
269271

270272

271273
Dynamic subtree partitioning with Balancer on specific ranks
272274
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
273275

274-
The CephFS file system provides the ``bal_rank_mask`` option to enable the balancer
275-
to dynamically rebalance subtrees within particular active MDS ranks. This
276-
allows administrators to employ both the dynamic subtree partitioning and
277-
static pining schemes in different active MDS ranks so that metadata loads
278-
are optimized based on user demand. For instance, in realistic cloud
279-
storage environments, where a lot of subvolumes are allotted to multiple
280-
computing nodes (e.g., VMs and containers), some subvolumes that require
281-
high performance are managed by static partitioning, whereas most subvolumes
282-
that experience a moderate workload are managed by the balancer. As the balancer
283-
evenly spreads the metadata workload to all active MDS ranks, performance of
284-
static pinned subvolumes inevitably may be affected or degraded. If this option
285-
is enabled, subtrees managed by the balancer are not affected by
286-
static pinned subtrees.
276+
.. _bal-rank-mask:
277+
278+
The CephFS file system provides the ``bal_rank_mask`` option to enable the
279+
balancer to dynamically rebalance subtrees within particular active MDS ranks.
280+
This allows administrators to employ both the dynamic subtree partitioning and
281+
static pining schemes in different active MDS ranks so that metadata loads are
282+
optimized based on user demand. For instance, in realistic cloud storage
283+
environments, where a lot of subvolumes are allotted to multiple computing
284+
nodes (e.g., VMs and containers), some subvolumes that require high performance
285+
are managed by static partitioning, whereas most subvolumes that experience a
286+
moderate workload are managed by the balancer. As the balancer evenly spreads
287+
the metadata workload to all active MDS ranks, performance of static pinned
288+
subvolumes inevitably may be affected or degraded. If this option is enabled,
289+
subtrees managed by the balancer are not affected by static pinned subtrees.
287290

288291
This option can be configured with the ``ceph fs set`` command. For example:
289292

0 commit comments

Comments
 (0)