Skip to content

Commit 3f28662

Browse files
documentation for the checkpointing.
1 parent 0c058ce commit 3f28662

File tree

1 file changed

+14
-1
lines changed

1 file changed

+14
-1
lines changed

docs/source/config.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,20 @@ checkpoint
300300
- performing one checkpointing per certain number of steps specified
301301
* - model_size
302302
- 10240
303-
- the size of the model in bytes
303+
- the size of the model parameters per GPU in bytes
304+
* - optimization_groups
305+
- []
306+
- List of optimization group tensors. Use Array notation for yaml.
307+
* - num_layers
308+
- 1
309+
- Number of layers to checkpoint. Each layer would be checkpointed separately.
310+
* - layer_parameters
311+
- []
312+
- List of parameters per layer. This is used to perform I/O per layer.
313+
* - type
314+
- rank_zero
315+
- Which rank performs this checkpoint. All ranks (all_ranks) or Rank 0 (rank_zero).
316+
304317

305318
.. note::
306319

0 commit comments

Comments
 (0)