You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/GRPO.md
+41-8Lines changed: 41 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,14 +22,36 @@ pip install -U trl
22
22
23
23

24
24
25
-
In SWIFT's GRPO training, the training model preferentially uses the front portion of the available GPUs, while the rollout process utilizes the rear portion of the available GPUs. This means:
25
+
The GRPO training framework supports the integration of high-performance inference engines (such as vLLM) to accelerate the sampling process, offering the following two deployment modes:
26
26
27
-
-**If both `NPROC_PER_NODE` and `num_infer_workers` in the command are equal to the number of available GPUs**, training and inference are assigned to the same GPUs. In this case, you need to configure `sleep_level`.
28
-
-**If the sum of `NPROC_PER_NODE` and `num_infer_workers` equals the total number of available GPUs**, training will use the front GPUs and rollout will use the rear GPUs. In this scenario, you can configure `async_generate`.
27
+
### 1. Internal Integration Mode
29
28
30
-
> Note: async_generate uses the policy model and responses of current_step-1, so in fact the `clip` method will be ignored
31
-
> If you encountered unstable in training, turn off this argument.
32
-
> In our experiments, unstable cases is not frequently occurring when async_generate is true.
29
+
- Launch the inference service directly within the Trainer.
30
+
- Provides two resource allocation strategies:
31
+
-**Colocate Mode**: Training and inference share GPU resources.
32
+
-**Async Mode**: Training and inference use separate GPU resources.
0 commit comments