You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paddlenlp/trainer/training_args.py
+3Lines changed: 3 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -243,6 +243,7 @@ class TrainingArguments:
243
243
enable_delay_scale_loss, accumulate gradients util optimizer step, all gradients div by inner pipeline accumute step. instead of div accumute step on loss directly.
244
244
enable_dp_comm_overlap, fuse data parallel gradient communication.
enable_release_grads, reduce peak memory usage by releasing gradients after each iteration. The creation of gradients will be postponed until backward propagation of the next iteration.
246
247
sharding_parallel_config (`str`, *optional*)(
247
248
Some additional config it highly affect the useage of sharding parallel, we provide some option to config it.
0 commit comments