You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/example_overview.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,7 @@ These notebooks are easier to run and are designed for quick experimentation wit
37
37
|[`grpo_ministral3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_ministral3_vl.ipynb)| GRPO Ministral 3 with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb)|
38
38
|[`openenv_sudoku_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_sudoku_grpo.ipynb)| GRPO to play Sudoku on an OpenEnv environment |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb)|
39
39
|[`openenv_wordle_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_wordle_grpo.ipynb)| GRPO to play Worldle on an OpenEnv environment |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb)|
40
+
|[`sft_nemotron_3.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_nemotron_3.ipynb)| SFT with LoRA on NVIDIA Nemotron 3 models |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb)|
40
41
|[`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb)| Supervised Fine-Tuning (SFT) using QLoRA on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb)|
41
42
|[`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb)| Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb)|
42
43
|[`sft_tool_calling.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_tool_calling.ipynb)| Teaching tool calling to a model without native tool-calling support using SFT with QLoRA |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb)|
@@ -80,6 +81,7 @@ Scripts are maintained in the [`trl/scripts`](https://github.com/huggingface/trl
80
81
|[`examples/scripts/rloo.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/rloo.py)| This script shows how to use the [`RLOOTrainer`] to fine-tune a model to improve its ability to solve math questions. |
81
82
|[`examples/scripts/sft.py`](https://github.com/huggingface/trl/blob/main/trl/scripts/sft.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a model. |
82
83
|[`examples/scripts/sft_gemma3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_gemma3.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Gemma 3 model. |
84
+
|[`examples/scripts/sft_nemotron_3.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py)| This script shows how to use the [`SFTTrainer`] to fine-tune an NVIDIA Nemotron 3 model. |
83
85
|[`examples/scripts/sft_tiny_aya_tool_calling.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_tiny_aya_tool_calling.py)| This script shows how to use the [`SFTTrainer`] to teach tool calling to a model without native tool-calling support using the [bebechien/SimpleToolCalling](https://huggingface.co/datasets/bebechien/SimpleToolCalling) dataset. |
84
86
|[`examples/scripts/sft_video_llm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_video_llm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Video Language Model. |
85
87
|[`examples/scripts/sft_vlm.py`](https://github.com/huggingface/trl/blob/main/examples/scripts/sft_vlm.py)| This script shows how to use the [`SFTTrainer`] to fine-tune a Vision Language Model in a chat setting. The script has only been tested with [LLaVA 1.5](https://huggingface.co/llava-hf/llava-1.5-7b-hf), [LLaVA 1.6](https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf), and [Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct) models, so users may see unexpected behaviour in other model architectures. |
TRL is a full stack library where we provide a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more.
8
11
The library is integrated with 🤗 [transformers](https://github.com/huggingface/transformers).
The paper shows that the standard concatenate-then-split preprocessing (`packing_strategy="wrapped"`) used for LLM training causes many documents to be arbitrarily truncated, which harms learning. It proposes packing document chunks into context windows using a Best-Fit Decreasing bin-packing algorithm, greatly reducing truncation while keeping high token utilization and improving model performance. TRL implements this as the `"bfd_split"` packing strategy in [`SFTConfig`]. For more details on packing, see the [SFT documentation](sft_trainer#packing).
1148
+
1149
+
```python
1150
+
from trl import SFTConfig
1151
+
1152
+
training_args = SFTConfig(
1153
+
packing=True,
1154
+
packing_strategy="bfd_split",
1155
+
max_length=4096,
1156
+
)
1157
+
```
1158
+
1143
1159
### Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Copy file name to clipboardExpand all lines: docs/source/reducing_memory_usage.md
+20-7Lines changed: 20 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,22 +67,35 @@ To help you choose an appropriate value, we provide a utility to visualize the s
67
67
68
68
[Truncation](#truncation) has several drawbacks:
69
69
70
-
1.**Loss of information**: Key data at the end of a sequence may be discarded.
71
-
2.**Choosing truncation length**: Too short loses data; too long undermines efficiency.
70
+
1.**Loss of information**: Important tokens at the end of sequences may be discarded.
71
+
2.**Choosing truncation length**: Too short loses data; too long reduces efficiency.
72
72
73
-
Packing, introduced in [Raffel et al., 2020](https://huggingface.co/papers/1910.10683), addresses these issues by grouping sequences instead of truncating. It concatenates and splits dataset sequences into the desired lengths.
73
+
Packing mitigates these issues by grouping multiple sequences into the same training row, filling each row up to `max_length`.
Packing reduces padding by merging several sequences in one row when possible. We use an advanced method to be near-optimal in the way we pack the dataset. To enable packing, use `packing=True` in the [`SFTConfig`].
77
+
TRL implements packing using **Best-Fit Decreasing (BFD)** bin packing, which groups sequences efficiently while minimizing padding. When a sequence exceeds `max_length`, different strategies determine how the overflow tokens are handled.
78
78
79
-
> [!TIP]
80
-
> In TRL 0.18 and earlier, packing used a more aggressive method that reduced padding to almost nothing, but had the downside of breaking sequence continuity for a large fraction of the dataset. To revert to this strategy, use `packing_strategy="wrapped"` in [`SFTConfig`].
79
+
TRL supports three strategies:
80
+
81
+
*`"bfd"` (default): Uses **Best-Fit Decreasing packing**. If a sequence exceeds `max_length`, the overflow tokens are discarded.
82
+
83
+
*`"bfd_split"`: Uses **Best-Fit Decreasing packing**, but long sequences are split into chunks ≤ `max_length` before packing. This preserves all tokens and follows the approach proposed in [Fewer Truncations Improve Language Modeling](https://huggingface.co/papers/2404.10830).
84
+
85
+
*`"wrapped"`: All tokens are concatenated into a stream and split into fixed-length blocks. This minimizes padding but may mix unrelated examples. This strategy corresponds to the *concatenate-then-split* preprocessing described in the literature (e.g., [Fewer Truncations Improve Language Modeling](https://huggingface.co/papers/2404.10830)). It has the downside of breaking sequence continuity for a large fraction of the dataset, which hurts performance, as discussed in the [Qwen3-Coder-Next Technical Report](https://huggingface.co/papers/2603.00729).
86
+
87
+
> [!NOTE]
88
+
> If all sequences are shorter than `max_length`, **`bfd` and `bfd_split` behave identically**, since no truncation or splitting is required.
Copy file name to clipboardExpand all lines: examples/notebooks/README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,7 @@ This directory contains a collection of Jupyter notebooks that demonstrate how t
12
12
|[`grpo_ministral3_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/grpo_ministral3_vl.ipynb)| GRPO Ministral 3 with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/grpo_ministral3_vl.ipynb)|
13
13
|[`openenv_sudoku_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_sudoku_grpo.ipynb)| GRPO to play Sudoku on an OpenEnv environment |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_sudoku_grpo.ipynb)|
14
14
|[`openenv_wordle_grpo.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/openenv_wordle_grpo.ipynb)| GRPO to play Worldle on an OpenEnv environment |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/openenv_wordle_grpo.ipynb)|
15
+
|[`sft_nemotron_3.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_nemotron_3.ipynb)| SFT with LoRA on NVIDIA Nemotron 3 models |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb)|
15
16
|[`sft_trl_lora_qlora.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_trl_lora_qlora.ipynb)| Supervised Fine-Tuning (SFT) using QLoRA on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb)|
16
17
|[`sft_qwen_vl.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_qwen_vl.ipynb)| Supervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free Colab |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_qwen_vl.ipynb)|
17
18
|[`sft_tool_calling.ipynb`](https://github.com/huggingface/trl/tree/main/examples/notebooks/sft_tool_calling.ipynb)| Teaching tool calling to a model without native tool-calling support using SFT with QLoRA |[](https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_tool_calling.ipynb)|
0 commit comments