Skip to content

Commit 0b5df71

Browse files
committed
add faq
1 parent 8fa6f76 commit 0b5df71

File tree

3 files changed

+167
-1
lines changed

3 files changed

+167
-1
lines changed

docs/sphinx_doc/source/index.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,12 @@ Welcome to Trinity-RFT's documentation!
3333
tutorial/trinity_configs.md
3434
tutorial/example_mix_algo.md
3535

36+
.. toctree::
37+
:maxdepth: 2
38+
:caption: FAQ
39+
40+
tutorial/faq.md
41+
3642
.. toctree::
3743
:maxdepth: 1
3844
:glob:
Lines changed: 160 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
# FAQ
2+
3+
## Part 1: Configurations
4+
**Q:** Why do most examples have two configuration YAML files, e.g., `gsm8k.yaml` and `train_gsm8k.yaml` in the `examples/grpo_gsm8k` directory?
5+
6+
**A:** Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, and the auxiliary YAML file starting with `train_` is used for configuring veRL, referred to [veRL documentation](https://github.com/volcengine/verl/blob/v0.4.0/docs/examples/config.rst).
7+
If you specify the path to `train_gsm8k.yaml` in `trainer.trainer_config_path`, Trinity-RFT will automatically pass the parameters to veRL.
8+
9+
We provide an alternative way to configure the veRL trainer. You may also directly specify the parameters in the `trainer.trainer_config` dictionary. This approach is mutually exclusive with using `trainer.trainer_config_path`.
10+
11+
Note that some parameters are not listed in the auxiliary configuration file (e.g., `train_gsm8k.yaml`), as they will be overridden by the parameters in the trinity configuration file (e.g., `gsm8k.yaml`). Please refer to `./trinity_configs.md` for more details.
12+
Future versions will gradually reduce parameters in `trainer.trainer_config` and `trainer.trainer_config_path` until it's fully deprecated.
13+
14+
---
15+
16+
**Q:** What's the relationship between `buffer.batch_size`, `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` and other batch sizes?
17+
18+
**A:** The following parameters are closely related:
19+
20+
- `buffer.batch_size`: The number of tasks in a batch, effective for both the explorer and the trainer.
21+
- `actor_rollout_ref.actor.ppo_mini_batch_size`: In the configuration, this value represents the number of tasks in a mini-batch, overridden by `buffer.batch_size`; but in the `update_policy` function, its value becomes the number of experiences in a mini-batch per GPU, i.e., `buffer.batch_size * algorithm.repeat_times / ngpus_trainer`.
22+
- `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`: The number of experiences in a micro-batch per GPU.
23+
24+
A minimal example showing their usage is as follows:
25+
26+
```python
27+
def update_policy(batch):
28+
dataloader = batch.split(ppo_mini_batch_size)
29+
for _ in range(ppo_epochs):
30+
for batch_idx, data in enumerate(dataloader):
31+
# Split data
32+
mini_batch = data
33+
if actor_rollout_ref.actor.use_dynamic_bsz:
34+
micro_batches, _ = rearrange_micro_batches(
35+
batch=mini_batch, max_token_len=max_token_len
36+
)
37+
else:
38+
micro_batches = mini_batch.split(actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu)
39+
40+
# Computing gradient
41+
for data in micro_batches:
42+
entropy, log_prob = self._forward_micro_batch(
43+
micro_batch=data, ...
44+
)
45+
pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower = compute_policy_loss(
46+
log_prob=log_prob, **data
47+
)
48+
policy_loss = pg_loss + ...
49+
loss = policy_loss / self.gradient_accumulation
50+
loss.backward()
51+
52+
# Optimizer step
53+
grad_norm = self._optimizer_step()
54+
self.actor_optimizer.zero_grad()
55+
```
56+
Please refer to `trinity/trainer/verl/dp_actor.py` for detailed implementation. veRL also provides an explanation in [FAQ](https://verl.readthedocs.io/en/latest/faq/faq.html#what-is-the-meaning-of-train-batch-size-mini-batch-size-and-micro-batch-size).
57+
58+
59+
## Part 2: Common Errors
60+
61+
**Error:**
62+
```bash
63+
File ".../flash_attn/flash_attn_interface.py", line 15, in ‹module>
64+
import flash_attn_2_cuda as flash_attn_gpu
65+
ImportError: ...
66+
```
67+
68+
**A:** The `flash-attn` module is not properly installed. Try to fix it by running `MAX_JOBS=128 pip install flash-attn`.
69+
70+
---
71+
72+
**Error:**
73+
```bash
74+
UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
75+
```
76+
77+
**A:** Try to log in to WandB before running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`.
78+
79+
---
80+
81+
**Error:**
82+
```bash
83+
ValueError: Failed to look up actor with name 'explorer' ...
84+
```
85+
86+
**A:** Try to restart Ray before running the experiment:
87+
88+
```bash
89+
ray stop
90+
ray start --head
91+
```
92+
93+
---
94+
95+
**Error:** Out-of-Memory (OOM) error
96+
97+
**A:** The following parameters may be helpful:
98+
99+
- For trainer, adjust `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` when `actor_rollout_ref.actor.use_dynamic_bsz=false`; adjust `actor_rollout_ref.actor.ppo_max_token_len_per_gpu` and `actor_rollout_ref.actor.ulysses_sequence_parallel_size` when `actor_rollout_ref.actor.use_dynamic_bsz=true`.
100+
- For exploere, adjust `explorer.rollout_model.tensor_parallel_size`,
101+
102+
103+
## Part 3: Debugging Methods [Coming Soon]
104+
To see the full logs of all processes and save it to `debug.log`:
105+
```bash
106+
export RAY_DEDUP_LOGS=0
107+
trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
108+
```
109+
110+
111+
## Part 4: Other Questions
112+
**Q:** What's the purpose of `buffer.trainer_input.experience_buffer.path`?
113+
114+
**A:** This path specifies the path to the SQLite database storaging the generated experiences. You may comment out this line if you don't want to use the SQLite database.
115+
116+
To see the experiences in the database, you can use the following Python script:
117+
118+
```python
119+
from sqlalchemy import create_engine
120+
from sqlalchemy.exc import OperationalError
121+
from sqlalchemy.orm import sessionmaker
122+
from sqlalchemy.pool import NullPool
123+
from trinity.common.schema import ExperienceModel
124+
125+
engine = create_engine(buffer.trainer_input.experience_buffer.path)
126+
session = sessionmaker(bind=engine)
127+
sess = session()
128+
129+
MAX_EXPERIENCES = 4
130+
experiences = (
131+
sess.query(ExperienceModel)
132+
.with_for_update()
133+
.limit(MAX_EXPERIENCES)
134+
.all()
135+
)
136+
137+
exp_list = []
138+
for exp in experiences:
139+
exp_list.append(ExperienceModel.to_experience(exp))
140+
141+
# Print the experiences
142+
for exp in exp_list:
143+
print(f"{exp.prompt_text=}", f"{exp.response_text=}")
144+
```
145+
146+
---
147+
148+
**Q:** How to load the checkpoints outside of the Trinity-RFT framework?
149+
150+
**A:** You need to specify `model.model_path` and `checkpoint_root_dir`. The following code snippet gives an example with transformers.
151+
152+
```python
153+
from transformers import AutoTokenizer, AutoModelForCausalLM
154+
from trinity.common.models.utils import load_state_dict_from_verl_checkpoint
155+
156+
model = AutoModelForCausalLM.from_pretrained(model.model_path)
157+
# Assume we need the checkpoint at step 780
158+
ckp_path = checkpoint_root_dir + "global_step_780/actor/"
159+
model.load_state_dict(load_state_dict_from_verl_checkpoint(ckp_path))
160+
```

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -399,7 +399,7 @@ data_processor:
399399

400400
For advanced users working with the `verl` trainer backend. This includes fine-grained settings for actor/critic models, optimizer parameters, and training loops.
401401

402-
> For full parameter meanings, refer to the [veRL documentation](https://github.com/volcengine/verl/blob/v0.3.0.post1/docs/examples/config.rst).
402+
> For full parameter meanings, refer to the [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html).
403403

404404

405405
```yaml

0 commit comments

Comments
 (0)