Skip to content

Commit ff1efa0

Browse files
committed
Add documentation for Deepspeed Zero 3 MiCS support (#20378)
1 parent 689d61c commit ff1efa0

File tree

2 files changed

+1
-0
lines changed

2 files changed

+1
-0
lines changed

docs/source-pytorch/advanced/model_parallel/deepspeed.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -408,6 +408,7 @@ Here is some helpful information when setting up DeepSpeed ZeRO Stage 3 with Lig
408408
* Treat your GPU/CPU memory as one large pool. In some cases, you may not want to offload certain things (like activations) to provide even more space to offload model parameters
409409
* When offloading to the CPU, make sure to bump up the batch size as GPU memory will be freed
410410
* We also support sharded checkpointing. By passing ``save_full_weights=False`` to the ``DeepSpeedStrategy``, we'll save shards of the model which allows you to save extremely large models. However to load the model and run test/validation/predict you must use the Trainer object.
411+
* DeepSpeed provides `MiCS support <https://deepspeed.readthedocs.io/en/latest/zero3.html#deepspeed.runtime.zero.config.DeepSpeedZeroConfig.mics_shard_size>`_ which allows you to control how model parameters are sharded across GPUs. This can be useful if you have a large cluster of GPUs and want to avoid communication overhead.
411412

412413
.. _deepspeed-zero-stage-3-single-file:
413414

tests/tests_pytorch/strategies/test_deepspeed.py

100755100644
File mode changed.

0 commit comments

Comments
 (0)