Skip to content

Commit e4d515d

Browse files
author
Prashant D
committed
pybind/mgr/pg_autoscaler: Introduce dynamic threshold to improve scaling sensitivity
The scaling threshold is now dynamically adjusted within _get_pool_pg_targets() based on the calculated ideal PG count (final_pg_target). This change allows the autoscaler to be more aggressive when adjusting smaller pools, and less aggressive when adjusting very large pools. Also improves logging to clarify why scaling decisions are made or skipped. Fixes: https://tracker.ceph.com/issues/73272 Signed-off-by: Prashant D <[email protected]>
1 parent 857a462 commit e4d515d

File tree

2 files changed

+41
-9
lines changed

2 files changed

+41
-9
lines changed

doc/rados/operations/placement-groups.rst

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,11 @@ The output will resemble the following::
153153
- **NEW PG_NUM** (if present) is the value that the system recommends that the
154154
``pg_num`` of the pool should be. It is always a power of two, and it
155155
is present only if the recommended value varies from the current value by
156-
more than the default factor of ``3``.
156+
more than the scaling threshold. This threshold defaults to the configured
157+
factor of ``3``. While scaling down uses only the configured factor, the
158+
threshold is dynamically reduced when scaling up: it is set to 1.0 if the
159+
recommended NEW PG_NUM is 512 or 1024, and to 2.0 if the recommended
160+
NEW PG_NUM is 2048.
157161
To adjust this multiple (in the following example, it is changed
158162
to ``2``), run a command of the following form:
159163

@@ -207,8 +211,10 @@ automatically scale each pool's ``pg_num`` in accordance with usage. Ceph consid
207211
total available storage, the target number of PG replicas for each OSD,
208212
and how much data is stored in each pool, then apportions PGs accordingly.
209213
The system is conservative with its approach, making changes to a pool only
210-
when the current number of PGs (``pg_num``) varies by more than a factor of 3
211-
from the recommended number.
214+
when the current number of PGs (``pg_num``) varies by more than the scaling threshold
215+
from the recommended number. When scaling down, only this configured factor is used.
216+
However, when scaling up, the threshold is dynamically reduced: it's automatically
217+
set to 1.0 when the recommended NEW PG_NUM is 512 or 1024, and to 2.0 when it is 2048.
212218

213219
The target number of PGs per OSD is determined by the ``mon_target_pg_per_osd``
214220
parameter (default: 100), which can be adjusted by running the following

src/pybind/mgr/pg_autoscaler/module.py

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -527,6 +527,17 @@ def _calc_final_pg_target(
527527
))
528528
return final_ratio, pool_pg_target, final_pg_target
529529

530+
def get_dynamic_threshold(
531+
self,
532+
final_pg_num: int,
533+
default_threshold: float,
534+
) -> float:
535+
if final_pg_num in (512, 1024):
536+
return 1.0
537+
elif final_pg_num == 2048:
538+
return 2.0
539+
return default_threshold
540+
530541
def _get_pool_pg_targets(
531542
self,
532543
osdmap: OSDMap,
@@ -615,12 +626,27 @@ def _get_pool_pg_targets(
615626
continue
616627

617628
adjust = False
618-
if (final_pg_target > p['pg_num_target'] * threshold or
619-
final_pg_target < p['pg_num_target'] / threshold) and \
620-
final_ratio >= 0.0 and \
621-
final_ratio <= 1.0 and \
622-
p['pg_autoscale_mode'] == 'on':
623-
adjust = True
629+
630+
# Dynamic threshold only applies to scaling UP, otherwise use the default threshold.
631+
if final_pg_target is not None and \
632+
final_pg_target > p['pg_num_target']:
633+
dynamic_threshold = self.get_dynamic_threshold(final_pg_target, threshold)
634+
adjust = final_pg_target > p['pg_num_target'] * dynamic_threshold
635+
else:
636+
adjust = final_pg_target < p['pg_num_target'] / threshold
637+
638+
if adjust and \
639+
final_ratio >= 0.0 and \
640+
final_ratio <= 1.0 and \
641+
p['pg_autoscale_mode'] == 'on':
642+
adjust = True
643+
else:
644+
if final_pg_target != p['pg_num_target']:
645+
self.log.warning("pool %s won't scale because recommended PG_NUM target"
646+
" value varies from current PG_NUM value by"
647+
" more than '%f' scaling threshold",
648+
pool_name,
649+
dynamic_threshold if final_pg_target > p['pg_num_target'] else threshold)
624650

625651
assert pool_pg_target is not None
626652
ret.append({

0 commit comments

Comments
 (0)