File tree Expand file tree Collapse file tree 1 file changed +14
-1
lines changed
Expand file tree Collapse file tree 1 file changed +14
-1
lines changed Original file line number Diff line number Diff line change @@ -300,7 +300,20 @@ checkpoint
300300 - performing one checkpointing per certain number of steps specified
301301 * - model_size
302302 - 10240
303- - the size of the model in bytes
303+ - the size of the model parameters per GPU in bytes
304+ * - optimization_groups
305+ - []
306+ - List of optimization group tensors. Use Array notation for yaml.
307+ * - num_layers
308+ - 1
309+ - Number of layers to checkpoint. Each layer would be checkpointed separately.
310+ * - layer_parameters
311+ - []
312+ - List of parameters per layer. This is used to perform I/O per layer.
313+ * - type
314+ - rank_zero
315+ - Which rank performs this checkpoint. All ranks (all_ranks) or Rank 0 (rank_zero).
316+
304317
305318.. note ::
306319
You can’t perform that action at this time.
0 commit comments