Skip to content

Commit 441725b

Browse files
authored
Merge branch 'main' into fix-retokenization-tool-loop
2 parents 6c9db28 + 6c0fccd commit 441725b

26 files changed

+1433
-236
lines changed

README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
1-
# TRL - Transformer Reinforcement Learning
1+
# TRL - Transformers Reinforcement Learning
22

33
<div style="text-align: center">
4-
<img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png" alt="TRL Banner">
4+
<picture>
5+
<source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/TRL%20banner%20light.png">
6+
<img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png" alt="TRL Banner">
7+
</picture>
58
</div>
69

710
<hr> <br>

docs/source/example_overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ These notebooks are easier to run and are designed for quick experimentation wit
3737
| [`grpo_ministral3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_ministral3_vl.ipynb) | GRPO Ministral 3 with QLoRA using TRL on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb) |
3838
| [`openenv_sudoku_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_sudoku_grpo.ipynb) | GRPO to play Sudoku on an OpenEnv environment | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb) |
3939
| [`openenv_wordle_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_wordle_grpo.ipynb) | GRPO to play Worldle on an OpenEnv environment | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb) |
40+
| [`sft_nemotron_3.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_nemotron_3.ipynb) | SFT with LoRA on NVIDIA Nemotron 3 models | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb) |
4041
| [`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb) | Supervised Fine-Tuning (SFT) using QLoRA on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb) |
4142
| [`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb) | Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb) |
4243
| [`sft_tool_calling.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_tool_calling.ipynb) | Teaching tool calling to a model without native tool-calling support using SFT with QLoRA | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb) |
@@ -80,6 +81,7 @@ Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl
8081
| [`examples/scripts/rloo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/rloo.py) | This script shows how to use the [`RLOOTrainer`] to fine-tune a model to improve its ability to solve math questions. |
8182
| [`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a model. |
8283
| [`examples/scripts/sft_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_gemma3.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model. |
84+
| [`examples/scripts/sft_nemotron_3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py) | This script shows how to use the [`SFTTrainer`] to fine-tune an NVIDIA Nemotron 3 model. |
8385
| [`examples/scripts/sft_tiny_aya_tool_calling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py) | This script shows how to use the [`SFTTrainer`] to teach tool calling to a model without native tool-calling support using the [bebechien/SimpleToolCalling](https://huggingface.co/datasets/bebechien/SimpleToolCalling) dataset. |
8486
| [`examples/scripts/sft_video_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_video_llm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Video Language Model. |
8587
| [`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py) | This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. |

docs/source/index.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
<div style="text-align: center">
2-
<img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png">
2+
<picture>
3+
<source media="(prefers-color-scheme: light)" srcset="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_light.png">
4+
<img src="https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/trl_banner_dark.png">
5+
</picture>
36
</div>
47

5-
# TRL - Transformer Reinforcement Learning
8+
# TRL - Transformers Reinforcement Learning
69

710
TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more.
811
The library is integrated with 🤗 [transformers](https://github.com/huggingface/transformers).

docs/source/paper_index.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1140,6 +1140,22 @@ SFTConfig(
11401140
)
11411141
```
11421142

1143+
### Fewer Truncations Improve Language Modeling
1144+
1145+
**📜 Paper**: https://huggingface.co/papers/2404.10830
1146+
1147+
The paper shows that the standard concatenate-then-split preprocessing (`packing_strategy="wrapped"`) used for LLM training causes many documents to be arbitrarily truncated, which harms learning. It proposes packing document chunks into context windows using a Best-Fit Decreasing bin-packing algorithm, greatly reducing truncation while keeping high token utilization and improving model performance. TRL implements this as the `"bfd_split"` packing strategy in [`SFTConfig`]. For more details on packing, see the [SFT documentation](sft_trainer#packing).
1148+
1149+
```python
1150+
from trl import SFTConfig
1151+
1152+
training_args = SFTConfig(
1153+
packing=True,
1154+
packing_strategy="bfd_split",
1155+
max_length=4096,
1156+
)
1157+
```
1158+
11431159
### Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
11441160

11451161
**📜 Paper**: https://huggingface.co/papers/1910.10683

docs/source/reducing_memory_usage.md

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -67,22 +67,35 @@ To help you choose an appropriate value, we provide a utility to visualize the s
6767
6868
[Truncation](#truncation) has several drawbacks:
6969

70-
1. **Loss of information**: Key data at the end of a sequence may be discarded.
71-
2. **Choosing truncation length**: Too short loses data; too long undermines efficiency.
70+
1. **Loss of information**: Important tokens at the end of sequences may be discarded.
71+
2. **Choosing truncation length**: Too short loses data; too long reduces efficiency.
7272

73-
Packing, introduced in [Raffel et al., 2020](https://huggingface.co/papers/1910.10683), addresses these issues by grouping sequences instead of truncating. It concatenates and splits dataset sequences into the desired lengths.
73+
Packing mitigates these issues by grouping multiple sequences into the same training row, filling each row up to `max_length`.
7474

7575
![Packing](https://huggingface.co/datasets/trl-lib/documentation-images/resolve/main/packing_3.png)
7676

77-
Packing reduces padding by merging several sequences in one row when possible. We use an advanced method to be near-optimal in the way we pack the dataset. To enable packing, use `packing=True` in the [`SFTConfig`].
77+
TRL implements packing using **Best-Fit Decreasing (BFD)** bin packing, which groups sequences efficiently while minimizing padding. When a sequence exceeds `max_length`, different strategies determine how the overflow tokens are handled.
7878

79-
> [!TIP]
80-
> In TRL 0.18 and earlier, packing used a more aggressive method that reduced padding to almost nothing, but had the downside of breaking sequence continuity for a large fraction of the dataset. To revert to this strategy, use `packing_strategy="wrapped"` in [`SFTConfig`].
79+
TRL supports three strategies:
80+
81+
* `"bfd"` (default): Uses **Best-Fit Decreasing packing**. If a sequence exceeds `max_length`, the overflow tokens are discarded.
82+
83+
* `"bfd_split"`: Uses **Best-Fit Decreasing packing**, but long sequences are split into chunks ≤ `max_length` before packing. This preserves all tokens and follows the approach proposed in [Fewer Truncations Improve Language Modeling](https://huggingface.co/papers/2404.10830).
84+
85+
* `"wrapped"`: All tokens are concatenated into a stream and split into fixed-length blocks. This minimizes padding but may mix unrelated examples. This strategy corresponds to the *concatenate-then-split* preprocessing described in the literature (e.g., [Fewer Truncations Improve Language Modeling](https://huggingface.co/papers/2404.10830)). It has the downside of breaking sequence continuity for a large fraction of the dataset, which hurts performance, as discussed in the [Qwen3-Coder-Next Technical Report](https://huggingface.co/papers/2603.00729).
86+
87+
> [!NOTE]
88+
> If all sequences are shorter than `max_length`, **`bfd` and `bfd_split` behave identically**, since no truncation or splitting is required.
8189
8290
```python
8391
from trl import SFTConfig
8492

85-
training_args = SFTConfig(..., packing=True, max_length=512)
93+
training_args = SFTConfig(
94+
...,
95+
packing=True,
96+
packing_strategy="bfd",
97+
max_length=512,
98+
)
8699
```
87100

88101
## PEFT for parameter-efficient fine-tuning

examples/notebooks/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ This directory contains a collection of Jupyter notebooks that demonstrate how t
1212
| [`grpo_ministral3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_ministral3_vl.ipynb) | GRPO Ministral 3 with QLoRA using TRL on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb) |
1313
| [`openenv_sudoku_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_sudoku_grpo.ipynb) | GRPO to play Sudoku on an OpenEnv environment | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb) |
1414
| [`openenv_wordle_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_wordle_grpo.ipynb) | GRPO to play Worldle on an OpenEnv environment | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb) |
15+
| [`sft_nemotron_3.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_nemotron_3.ipynb) | SFT with LoRA on NVIDIA Nemotron 3 models | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb) |
1516
| [`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb) | Supervised Fine-Tuning (SFT) using QLoRA on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb) |
1617
| [`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb) | Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb) |
1718
| [`sft_tool_calling.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_tool_calling.ipynb) | Teaching tool calling to a model without native tool-calling support using SFT with QLoRA | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb) |

0 commit comments

Comments
 (0)