|
| 1 | +# FAQ |
| 2 | + |
| 3 | +## Part 1: Configurations |
| 4 | +**Q:** Why do most examples have two configuration YAML files, e.g., `gsm8k.yaml` and `train_gsm8k.yaml` in the `examples/grpo_gsm8k` directory? |
| 5 | + |
| 6 | +**A:** Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, and the auxiliary YAML file starting with `train_` is used for configuring veRL, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). |
| 7 | +If you specify the path to `train_gsm8k.yaml` in `trainer.trainer_config_path`, Trinity-RFT will automatically pass the parameters to veRL. |
| 8 | + |
| 9 | +We provide an alternative way to configure the veRL trainer. You may also directly specify the parameters in the `trainer.trainer_config` dictionary. This approach is mutually exclusive with using `trainer.trainer_config_path`. |
| 10 | + |
| 11 | +Note that some parameters are not listed in the auxiliary configuration file (e.g., `train_gsm8k.yaml`), as they will be overridden by the parameters in the trinity configuration file (e.g., `gsm8k.yaml`). Please refer to `./trinity_configs.md` for more details. |
| 12 | +For users' convenience, future versions will gradually reduce parameters in `trainer.trainer_config` and `trainer.trainer_config_path` until it's fully deprecated. |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +**Q:** What's the relationship between `buffer.batch_size`, `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` and other batch sizes? |
| 17 | + |
| 18 | +**A:** The following parameters are closely related: |
| 19 | + |
| 20 | +- `buffer.batch_size`: The number of tasks in a batch, effective for both the explorer and the trainer. |
| 21 | +- `actor_rollout_ref.actor.ppo_mini_batch_size`: In the configuration, this value represents the number of tasks in a mini-batch, overridden by `buffer.batch_size`; but in the `update_policy` function, its value becomes the number of experiences in a mini-batch per GPU, i.e., `buffer.batch_size * algorithm.repeat_times (/ ngpus_trainer)`. The expression of dividing `ngpus_trainer` is caused by implict data allocation to GPUs, but this do not affects the result after gradient accumulation. |
| 22 | +- `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`: The number of experiences in a micro-batch per GPU. |
| 23 | + |
| 24 | +A minimal example showing their usage is as follows: |
| 25 | + |
| 26 | +```python |
| 27 | +def update_policy(batch_exps): |
| 28 | + dataloader = batch_epxs.split(ppo_mini_batch_size) # here `ppo_mini_batch_size` is in terms of experiences |
| 29 | + for _ in range(ppo_epochs): |
| 30 | + for batch_idx, data in enumerate(dataloader): |
| 31 | + # Split data |
| 32 | + mini_batch = data |
| 33 | + if actor_rollout_ref.actor.use_dynamic_bsz: |
| 34 | + micro_batches, _ = rearrange_micro_batches( |
| 35 | + batch=mini_batch, max_token_len=max_token_len |
| 36 | + ) |
| 37 | + else: |
| 38 | + micro_batches = mini_batch.split(actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu) |
| 39 | + |
| 40 | + # Computing gradient |
| 41 | + for data in micro_batches: |
| 42 | + entropy, log_prob = self._forward_micro_batch( |
| 43 | + micro_batch=data, ... |
| 44 | + ) |
| 45 | + pg_loss, pg_clipfrac, ppo_kl, pg_clipfrac_lower = compute_policy_loss( |
| 46 | + log_prob=log_prob, **data |
| 47 | + ) |
| 48 | + policy_loss = pg_loss + ... |
| 49 | + loss = policy_loss / self.gradient_accumulation |
| 50 | + loss.backward() |
| 51 | + |
| 52 | + # Optimizer step |
| 53 | + grad_norm = self._optimizer_step() |
| 54 | + self.actor_optimizer.zero_grad() |
| 55 | +``` |
| 56 | +Please refer to `trinity/trainer/verl/dp_actor.py` for detailed implementation. veRL also provides an explanation in [FAQ](https://verl.readthedocs.io/en/latest/faq/faq.html#what-is-the-meaning-of-train-batch-size-mini-batch-size-and-micro-batch-size). |
| 57 | + |
| 58 | + |
| 59 | +## Part 2: Common Errors |
| 60 | + |
| 61 | +**Error:** |
| 62 | +```bash |
| 63 | +File ".../flash_attn/flash_attn_interface.py", line 15, in ‹module> |
| 64 | + import flash_attn_2_cuda as flash_attn_gpu |
| 65 | +ImportError: ... |
| 66 | +``` |
| 67 | + |
| 68 | +**A:** The `flash-attn` module is not properly installed. Try to fix it by running `pip install flash-attn` or `pip install flash-attn -v --no-build-isolation`. |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +**Error:** |
| 73 | +```bash |
| 74 | +UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ... |
| 75 | +``` |
| 76 | + |
| 77 | +**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +**Error:** |
| 82 | +```bash |
| 83 | +ValueError: Failed to look up actor with name 'explorer' ... |
| 84 | +``` |
| 85 | + |
| 86 | +**A:** Make sure Ray is started before running the experiment. If Ray is already running, you can restart it with the following commands: |
| 87 | + |
| 88 | +```bash |
| 89 | +ray stop |
| 90 | +ray start --head |
| 91 | +``` |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +**Error:** Out-of-Memory (OOM) error |
| 96 | + |
| 97 | +**A:** The following parameters may be helpful: |
| 98 | + |
| 99 | +- For trainer, adjust `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` when `actor_rollout_ref.actor.use_dynamic_bsz=false`; adjust `actor_rollout_ref.actor.ppo_max_token_len_per_gpu` and `actor_rollout_ref.actor.ulysses_sequence_parallel_size` when `actor_rollout_ref.actor.use_dynamic_bsz=true`. |
| 100 | +- For explorer, adjust `explorer.rollout_model.tensor_parallel_size`, |
| 101 | + |
| 102 | + |
| 103 | +## Part 3: Debugging Methods [Coming Soon] |
| 104 | +To see the full logs of all processes and save it to `debug.log`: |
| 105 | +```bash |
| 106 | +export RAY_DEDUP_LOGS=0 |
| 107 | +trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log |
| 108 | +``` |
| 109 | + |
| 110 | + |
| 111 | +## Part 4: Other Questions |
| 112 | +**Q:** What's the purpose of `buffer.trainer_input.experience_buffer.path`? |
| 113 | + |
| 114 | +**A:** This path specifies the path to the SQLite database storaging the generated experiences. You may comment out this line if you don't want to use the SQLite database. |
| 115 | + |
| 116 | +To see the experiences in the database, you can use the following Python script: |
| 117 | + |
| 118 | +```python |
| 119 | +from sqlalchemy import create_engine |
| 120 | +from sqlalchemy.exc import OperationalError |
| 121 | +from sqlalchemy.orm import sessionmaker |
| 122 | +from sqlalchemy.pool import NullPool |
| 123 | +from trinity.common.schema import ExperienceModel |
| 124 | + |
| 125 | +engine = create_engine(buffer.trainer_input.experience_buffer.path) |
| 126 | +session = sessionmaker(bind=engine) |
| 127 | +sess = session() |
| 128 | + |
| 129 | +MAX_EXPERIENCES = 4 |
| 130 | +experiences = ( |
| 131 | + sess.query(ExperienceModel) |
| 132 | + .with_for_update() |
| 133 | + .limit(MAX_EXPERIENCES) |
| 134 | + .all() |
| 135 | +) |
| 136 | + |
| 137 | +exp_list = [] |
| 138 | +for exp in experiences: |
| 139 | + exp_list.append(ExperienceModel.to_experience(exp)) |
| 140 | + |
| 141 | +# Print the experiences |
| 142 | +for exp in exp_list: |
| 143 | + print(f"{exp.prompt_text=}", f"{exp.response_text=}") |
| 144 | +``` |
| 145 | + |
| 146 | +--- |
| 147 | + |
| 148 | +**Q:** How to load the checkpoints outside of the Trinity-RFT framework? |
| 149 | + |
| 150 | +**A:** You need to specify model path and checkpoint path. The following code snippet gives an example with transformers. |
| 151 | + |
| 152 | +```python |
| 153 | +import os |
| 154 | +from transformers import AutoTokenizer, AutoModelForCausalLM |
| 155 | +from trinity.common.models.utils import load_state_dict_from_verl_checkpoint |
| 156 | + |
| 157 | +# Assume we need the checkpoint at step 780; |
| 158 | +# model_path, checkpoint_root_dir, project, and name are already defined |
| 159 | +model = AutoModelForCausalLM.from_pretrained(model_path) |
| 160 | +ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor") |
| 161 | +model.load_state_dict(load_state_dict_from_verl_checkpoint(ckp_path)) |
| 162 | +``` |
0 commit comments