|
1 | 1 | # FAQ |
2 | 2 |
|
3 | 3 | ## Part 1: Configurations |
4 | | -**Q:** Why do most examples have two configuration YAML files, e.g., `gsm8k.yaml` and `train_gsm8k.yaml` in the `examples/grpo_gsm8k` directory? |
| 4 | +**Q:** How do I configure the parameters? |
5 | 5 |
|
6 | | -**A:** Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, and the auxiliary YAML file starting with `train_` is used for configuring veRL, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). |
7 | | -If you specify the path to `train_gsm8k.yaml` in `trainer.trainer_config_path`, Trinity-RFT will automatically pass the parameters to veRL. |
| 6 | +**A:** You can use the config manager to configure the parameters by running `trinity studio --port 8080`. This approach provides a convenient way to configure the parameters. |
8 | 7 |
|
9 | | -We provide an alternative way to configure the veRL trainer. You may also directly specify the parameters in the `trainer.trainer_config` dictionary. This approach is mutually exclusive with using `trainer.trainer_config_path`. |
10 | | - |
11 | | -Note that some parameters are not listed in the auxiliary configuration file (e.g., `train_gsm8k.yaml`), as they will be overridden by the parameters in the trinity configuration file (e.g., `gsm8k.yaml`). Please refer to `./trinity_configs.md` for more details. |
12 | | -For users' convenience, future versions will gradually reduce parameters in `trainer.trainer_config` and `trainer.trainer_config_path` until it's fully deprecated. |
| 8 | +Advanced users can also edit the config file directly. |
| 9 | +Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, which can have massive parameters, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). You may specify these parameters in two ways: (1) specify the parameters in the `trainer.trainer_config` dictionary; (2) specify them in an auxiliary YAML file starting with `train_` and pass the path to `train_gsm8k.yaml` in `trainer.trainer_config_path`. These two ways are mutually exclusive. |
13 | 10 |
|
14 | 11 | --- |
15 | 12 |
|
16 | | -**Q:** What's the relationship between `buffer.batch_size`, `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` and other batch sizes? |
| 13 | +**Q:** What's the relationship between `buffer.batch_size`, `buffer.train_batch_size`, `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` and other batch sizes? |
17 | 14 |
|
18 | 15 | **A:** The following parameters are closely related: |
19 | 16 |
|
20 | | -- `buffer.batch_size`: The number of tasks in a batch, effective for both the explorer and the trainer. |
| 17 | +- `buffer.batch_size`: The number of tasks in a batch, effective for the explorer. |
| 18 | +- `buffer.train_batch_size`: The number of experiences in a mini-batch, effective for the trainer. If not specified, it defaults to `buffer.batch_size` * `algorithm.repeat_times`. |
21 | 19 | - `actor_rollout_ref.actor.ppo_mini_batch_size`: The number of experiences in a mini-batch, overridden by `buffer.train_batch_size`; but in the `update_policy` function, its value becomes the number of experiences in a mini-batch per GPU, i.e., `buffer.train_batch_size (/ ngpus_trainer)`. The expression of dividing `ngpus_trainer` is caused by implict data allocation to GPUs, but this do not affects the result after gradient accumulation. |
22 | 20 | - `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`: The number of experiences in a micro-batch per GPU. |
23 | 21 |
|
|
0 commit comments