update docs (#20378)

hehepig4 · hehepig4 · commit 5409bc94ce5f · 2024-12-10T10:33:11.000+08:00
diff --git a/docs/source-pytorch/advanced/model_parallel/deepspeed.rst b/docs/source-pytorch/advanced/model_parallel/deepspeed.rst
@@ -408,7 +408,7 @@ Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lig
 * Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters
 * When offloading to the CPU, make sure to bump up the batch size as GPU memory will be freed
 * We also support sharded checkpointing. By passing ``save_full_weights=False`` to the ``DeepSpeedStrategy``, we'll save shards of the model which allows you to save extremely large models. However to load the model and run test/validation/predict you must use the Trainer object.
-* DeepSpeed provides `MiCS support <https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeed.runtime.zero.config.DeepSpeedZeroConfig.mics_shard_size>`_ which allows you to control how model parameters are sharded across GPUs. This can be useful if you have a large cluster of GPUs and want to avoid communication overhead.
+* DeepSpeed provides `MiCS support <https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeed.runtime.zero.config.DeepSpeedZeroConfig.mics_shard_size>`_ which allows you to control how model parameters are sharded across GPUs. For example, with 16 GPUs, ZeRO-3 will shard the model into 16 pieces by default. Instead with ``mics_shard_size=8``, every 8 GPUs will keep a full copy of the model weights, reducing the communication overhead. You can set ``"zero_optimization":  {"stage": 3, "mics_shard_size": (shards num), ...}`` in a DeepSpeed config file to take advantage of this feature.
 
 .. _deepspeed-zero-stage-3-single-file:
 
diff --git a/src/lightning/pytorch/strategies/deepspeed.py b/src/lightning/pytorch/strategies/deepspeed.py
@@ -522,7 +522,7 @@ def model_sharded_context(self) -> Generator[None, None, None]:
 
         self._init_config_if_needed()
         assert self.config is not None
-        # If detect 'mics_shard_size'>0 in config['zero_optimization'], alter to use deepspeed.zero.MiCS_Init()
+        # If we detect `'mics_shard_size' > 0` in `config['zero_optimization']`, use `deepspeed.zero.MiCS_Init(...)` instead of `deepspeed.zero.Init(...)`
         # https://deepspeed.readthedocs.io/en/latest/zero3.html#mics-configurations
         #! default deepspeed 0.9.0 is not compatible
         if (