Skip to content

Commit 5409bc9

Browse files
committed
update docs (#20378)
1 parent 39e1e89 commit 5409bc9

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

docs/source-pytorch/advanced/model_parallel/deepspeed.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -408,7 +408,7 @@ Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lig
408408
* Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters
409409
* When offloading to the CPU, make sure to bump up the batch size as GPU memory will be freed
410410
* We also support sharded checkpointing. By passing ``save_full_weights=False`` to the ``DeepSpeedStrategy``, we'll save shards of the model which allows you to save extremely large models. However to load the model and run test/validation/predict you must use the Trainer object.
411-
* DeepSpeed provides `MiCS support <https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeed.runtime.zero.config.DeepSpeedZeroConfig.mics_shard_size>`_ which allows you to control how model parameters are sharded across GPUs. This can be useful if you have a large cluster of GPUs and want to avoid communication overhead.
411+
* DeepSpeed provides `MiCS support <https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeed.runtime.zero.config.DeepSpeedZeroConfig.mics_shard_size>`_ which allows you to control how model parameters are sharded across GPUs. For example, with 16 GPUs, ZeRO-3 will shard the model into 16 pieces by default. Instead with ``mics_shard_size=8``, every 8 GPUs will keep a full copy of the model weights, reducing the communication overhead. You can set ``"zero_optimization": {"stage": 3, "mics_shard_size": (shards num), ...}`` in a DeepSpeed config file to take advantage of this feature.
412412

413413
.. _deepspeed-zero-stage-3-single-file:
414414

src/lightning/pytorch/strategies/deepspeed.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -522,7 +522,7 @@ def model_sharded_context(self) -> Generator[None, None, None]:
522522

523523
self._init_config_if_needed()
524524
assert self.config is not None
525-
# If detect 'mics_shard_size'>0 in config['zero_optimization'], alter to use deepspeed.zero.MiCS_Init()
525+
# If we detect `'mics_shard_size' > 0` in `config['zero_optimization']`, use `deepspeed.zero.MiCS_Init(...)` instead of `deepspeed.zero.Init(...)`
526526
# https://deepspeed.readthedocs.io/en/latest/zero3.html#mics-configurations
527527
#! default deepspeed 0.9.0 is not compatible
528528
if (

0 commit comments

Comments
 (0)