pybind/mgr/pg_autoscaler: Introduce dynamic threshold to improve scaling sensitivity

Prashant D · Prashant D · commit e4d515d30043 · 2025-10-06T10:52:12.000-04:00
The scaling threshold is now dynamically adjusted within _get_pool_pg_targets() based on the calculated ideal PG count (final_pg_target). This change allows the autoscaler to be more aggressive when adjusting smaller pools, and less aggressive when adjusting very large pools. Also improves logging to clarify why scaling decisions are made or skipped. Fixes: https://tracker.ceph.com/issues/73272 Signed-off-by: Prashant D <pdhange@redhat.com>
diff --git a/doc/rados/operations/placement-groups.rst b/doc/rados/operations/placement-groups.rst
@@ -153,7 +153,11 @@ The output will resemble the following::
 - **NEW PG_NUM** (if present) is the value that the system recommends that the
   ``pg_num`` of the pool should be. It is always a power of two, and it
   is present only if the recommended value varies from the current value by
-  more than the default factor of ``3``.
+  more than the scaling threshold. This threshold defaults to the configured
+  factor of ``3``. While scaling down uses only the configured factor, the
+  threshold is dynamically reduced when scaling up: it is set to 1.0 if the
+  recommended NEW PG_NUM is 512 or 1024, and to 2.0 if the recommended
+  NEW PG_NUM is 2048.
   To adjust this multiple (in the following example, it is changed
   to ``2``), run a command of the following form:
 
@@ -207,8 +211,10 @@ automatically scale each pool's ``pg_num`` in accordance with usage. Ceph consid
 total available storage, the target number of PG replicas for each OSD,
 and how much data is stored in each pool, then apportions PGs accordingly.
 The system is conservative with its approach, making changes to a pool only
-when the current number of PGs (``pg_num``) varies by more than a factor of 3
-from the recommended number.
+when the current number of PGs (``pg_num``) varies by more than the scaling threshold
+from the recommended number. When scaling down, only this configured factor is used.
+However, when scaling up, the threshold is dynamically reduced: it's automatically
+set to 1.0 when the recommended NEW PG_NUM is 512 or 1024, and to 2.0 when it is 2048.
 
 The target number of PGs per OSD is determined by the ``mon_target_pg_per_osd``
 parameter (default: 100), which can be adjusted by running the following
diff --git a/src/pybind/mgr/pg_autoscaler/module.py b/src/pybind/mgr/pg_autoscaler/module.py
@@ -527,6 +527,17 @@ def _calc_final_pg_target(
         ))
         return final_ratio, pool_pg_target, final_pg_target
 
+    def get_dynamic_threshold(
+            self,
+            final_pg_num: int,
+            default_threshold: float,
+    ) -> float:
+        if final_pg_num in (512, 1024):
+            return 1.0
+        elif final_pg_num == 2048:
+            return 2.0
+        return default_threshold
+
     def _get_pool_pg_targets(
             self,
             osdmap: OSDMap,
@@ -615,12 +626,27 @@ def _get_pool_pg_targets(
                 continue
 
             adjust = False
-            if (final_pg_target > p['pg_num_target'] * threshold or
-                    final_pg_target < p['pg_num_target'] / threshold) and \
-                    final_ratio >= 0.0 and \
-                    final_ratio <= 1.0 and \
-                    p['pg_autoscale_mode'] == 'on':
-                adjust = True
+
+            # Dynamic threshold only applies to scaling UP, otherwise use the default threshold.
+            if final_pg_target is not None and \
+               final_pg_target > p['pg_num_target']:
+                dynamic_threshold = self.get_dynamic_threshold(final_pg_target, threshold)
+                adjust = final_pg_target > p['pg_num_target'] * dynamic_threshold
+            else:
+                adjust = final_pg_target < p['pg_num_target'] / threshold
+
+            if adjust and \
+               final_ratio >= 0.0 and \
+               final_ratio <= 1.0 and \
+               p['pg_autoscale_mode'] == 'on':
+                    adjust = True
+            else:
+                if final_pg_target != p['pg_num_target']:
+                    self.log.warning("pool %s won't scale because recommended PG_NUM target"
+                                     " value varies from current PG_NUM value by"
+                                     " more than '%f' scaling threshold",
+                                     pool_name,
+                                     dynamic_threshold if final_pg_target > p['pg_num_target'] else threshold)
 
             assert pool_pg_target is not None
             ret.append({