File tree Expand file tree Collapse file tree 2 files changed +1
-4
lines changed
docs/sphinx_doc/source_zh/tutorial Expand file tree Collapse file tree 2 files changed +1
-4
lines changed Original file line number Diff line number Diff line change @@ -497,7 +497,7 @@ trainer:
497497- `use_dynamic_bsz` : 是否使用动态批量大小。
498498- `max_token_len_per_gpu` : 训练过程中,每个 GPU 最大 token 长度; 当 `use_dynamic_bsz=true` 时生效。
499499- `ulysses_sequence_parallel_size` : 序列并行的并行度,即用于分割单个序列的 GPU 数量。
500- - `max_checkpoints_to_keep` : 保留的最大检查点数量。超过此数量后,最旧的检查点将被删除。
500+ - `max_checkpoints_to_keep` : 保留的最大检查点数量。超过此数量后,最旧的检查点将被删除。如果未指定,则将保留所有检查点。
501501- `trainer_config` : 内联提供的 trainer 配置。
502502
503503---
Original file line number Diff line number Diff line change @@ -126,9 +126,6 @@ async def _find_verl_latest_state_dict(self) -> None:
126126 await asyncio .sleep (1 )
127127
128128 async def _remove_previous_state_dict (self , previous_model_version : int ) -> None :
129- self .logger .info (
130- f"Synchronizer is removing previous checkpoint for sync at step { previous_model_version } ."
131- )
132129 previous_state_dict_dir = os .path .join (
133130 self .config .checkpoint_job_dir , f"global_step_{ previous_model_version } "
134131 )
You can’t perform that action at this time.
0 commit comments