Skip to content

Commit b5f735d

Browse files
jingyushenStephenRi
authored andcommitted
fix md display in config guide
1 parent 94637ff commit b5f735d

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

docs_roll/docs/English/QuickStart/config_guide.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -155,8 +155,8 @@ Used for configuring training parameters such as `learning_rate`, `weight_decay`
155155
- `training_args.per_device_train_batch_size`: The batch size to use when training.
156156
- `training_args.gradient_accumulation_steps`: The number of gradient accumulation steps.
157157

158-
In deepspeed training the global train batch size is `per_device_train_batch_size` * `gradient_accumulation_steps` * world_size (a.k.a length of `device_mapping` for `actor_train`/`critic`).
158+
In deepspeed training the global train batch size is `per_device_train_batch_size` \* `gradient_accumulation_steps` \* world_size (a.k.a length of `device_mapping` for `actor_train`/`critic`).
159159

160-
In megatron training the global train batch size is `per_device_train_batch_size` * `gradient_accumulation_steps` * world_size / `tensor_model_parallel_size` / `pipeline_model_parallel_size` / `context_parallel_size` (don't need to divide `expert_model_parallel_size`).
160+
In megatron training the global train batch size is `per_device_train_batch_size` \* `gradient_accumulation_steps` \* world_size / `tensor_model_parallel_size` / `pipeline_model_parallel_size` / `context_parallel_size` (don't need to divide `expert_model_parallel_size`).
161161

162-
If you want to perform one optimization step in each rollout, set `gradient_accumulation_steps` to `rollout_batch_size` * `num_return_sequences_in_group` * `tensor_model_parallel_size` * `pipeline_model_parallel_size` * `context_parallel_size`/ `per_device_train_batch_size` / world_size.
162+
If you want to perform one optimization step in each rollout, set `gradient_accumulation_steps` to `rollout_batch_size` \* `num_return_sequences_in_group` \* `tensor_model_parallel_size` \* `pipeline_model_parallel_size` \* `context_parallel_size`/ `per_device_train_batch_size` / world_size.

docs_roll/docs/简体中文/快速开始/config_guide_cn.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -286,8 +286,8 @@ actor_train:
286286
- `training_args.per_device_train_batch_size`: 在每个设备上进行训练时使用的批次大小。
287287
- `training_args.gradient_accumulation_steps`: 梯度累积的步数。
288288

289-
在 DeepSpeed 训练中,全局训练批次大小是`per_device_train_batch_size` * `gradient_accumulation_steps` * world_size (即`actor_train`/`critic`的`device_mapping`长度)。
289+
在 DeepSpeed 训练中,全局训练批次大小是`per_device_train_batch_size` \* `gradient_accumulation_steps` \* world_size (即`actor_train`/`critic`的`device_mapping`长度)。
290290

291-
在 Megatron 训练中,全局训练批次大小是`per_device_train_batch_size` * `gradient_accumulation_steps` * world_size / `tensor_model_parallel_size` / `pipeline_model_parallel_size` / `context_parallel_size` (不需要除以`expert_model_parallel_size`).
291+
在 Megatron 训练中,全局训练批次大小是`per_device_train_batch_size` \* `gradient_accumulation_steps` \* world_size / `tensor_model_parallel_size` / `pipeline_model_parallel_size` / `context_parallel_size` (不需要除以`expert_model_parallel_size`).
292292

293-
如果你想在每次 Rollout 中执行一次优化步骤,则应设置`gradient_accumulation_steps`为 `rollout_batch_size` * `num_return_sequences_in_group` * `tensor_model_parallel_size` * `pipeline_model_parallel_size` * `context_parallel_size`/ `per_device_train_batch_size` / world_size.
293+
如果你想在每次 Rollout 中执行一次优化步骤,则应设置`gradient_accumulation_steps`为 `rollout_batch_size` \* `num_return_sequences_in_group` \* `tensor_model_parallel_size` \* `pipeline_model_parallel_size` \* `context_parallel_size`/ `per_device_train_batch_size` / world_size.

0 commit comments

Comments
 (0)