Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions .github/workflows/build_documentation.yml

This file was deleted.

19 changes: 0 additions & 19 deletions .github/workflows/build_pr_documentation.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,6 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install ".[dev, torch]"
python -m pip install ".[dev]"
- name: Run unit tests
run: HF_TOKEN=$HF_TOKEN pytest -sv tests/
16 changes: 0 additions & 16 deletions .github/workflows/upload_pr_documentation.yml

This file was deleted.

2 changes: 1 addition & 1 deletion CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ authors:
family-names: Wolf
repository-code: 'https://github.com/huggingface/alignment-handbook'
license: Apache-2.0
version: 0.3.0.dev0
version: 0.4.0.dev0
31 changes: 17 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155
The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.

## News 🗞️
* **July 24, 2025**: We release the full [post-training recipe](recipes/smollm2/README.md) behind SmolLM3-3B: a state-of-the-art hybrid reasoning model 💭
* **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for fine-tuning SmolLM2-Instruct.
* **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻
* **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁
Expand Down Expand Up @@ -60,32 +61,35 @@ The initial release of the handbook will focus on the following techniques:

## Installation instructions

To run the code in this project, first, create a Python virtual environment using e.g. Conda:
To run the code in this project, first, create a Python virtual environment using e.g. `uv`:

```shell
conda create -n handbook python=3.10 && conda activate handbook
uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip
```

Next, install PyTorch `v2.1.2` - the precise version is important for reproducibility! Since this is hardware-dependent, we
direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/).
> [!TIP]
> To install `uv`, follow the [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/).

Next, install PyTorch `v2.6.0`

```shell
uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126
```

Note that the precise version is important for reproducibility! Since this is hardware-dependent, we also direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/).

You can then install the remaining package dependencies as follows:

```shell
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .
uv pip install .
```

You will also need Flash Attention 2 installed, which can be done by running:

```shell
python -m pip install flash-attn --no-build-isolation
uv pip install "flash-attn==2.7.4.post1" --no-build-isolation
```

> **Note**
> If your machine has less than 96GB of RAM and many CPU cores, reduce the `MAX_JOBS` arguments, e.g. `MAX_JOBS=4 pip install flash-attn --no-build-isolation`

Next, log into your Hugging Face account as follows:

```shell
Expand All @@ -106,7 +110,6 @@ You can now check out the `scripts` and `recipes` directories for instructions o
├── LICENSE
├── Makefile <- Makefile with commands like `make style`
├── README.md <- The top-level README for developers using this project
├── chapters <- Educational content to render on hf.co/learn
├── recipes <- Recipe configs, accelerate configs, slurm scripts
├── scripts <- Scripts to train and evaluate chat models
├── setup.cfg <- Installation config (mostly used for configuring code quality & tests)
Expand All @@ -121,10 +124,10 @@ If you find the content of this repo useful in your work, please cite it as foll

```bibtex
@software{Tunstall_The_Alignment_Handbook,
author = {Tunstall, Lewis and Beeching, Edward and Lambert, Nathan and Rajani, Nazneen and Huang, Shengyi and Rasul, Kashif and Bartolome, Alvaro and M. Rush, Alexander and Wolf, Thomas},
author = {Tunstall, Lewis and Beeching, Edward and Lambert, Nathan and Rajani, Nazneen and Huang, Shengyi and Rasul, Kashif and Bartolome, Alvaro, and Patiño, M. Carlos and M. Rush, Alexander and Wolf, Thomas},
license = {Apache-2.0},
title = {{The Alignment Handbook}},
url = {https://github.com/huggingface/alignment-handbook},
version = {0.3.0.dev0}
version = {0.4.0.dev0}
}
```
4 changes: 0 additions & 4 deletions chapters/en/_toctree.yml

This file was deleted.

3 changes: 0 additions & 3 deletions chapters/en/chapter0/introduction.mdx

This file was deleted.

25 changes: 0 additions & 25 deletions recipes/accelerate_configs/fsdp_qlora.yaml

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@ same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
use_cpu: false
4 changes: 2 additions & 2 deletions recipes/constitutional-ai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@ This repo includes the recipe for training the following models:
You will require 8 GPUs (80GB of VRAM) to train the full model.
```shell
# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/constitutional-ai/sft/config_{grok,anthropic}.yaml
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/sft.py --config recipes/constitutional-ai/sft/config_{grok,anthropic}.yaml

# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/constitutional-ai/dpo/config_anthropic.yaml
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/dpo.py --config recipes/constitutional-ai/dpo/config_anthropic.yaml
# Note that we did not include the DPO recipe for grok, as that model's seems overtrained and too snarky.
```

Expand Down
40 changes: 33 additions & 7 deletions recipes/constitutional-ai/dpo/config_anthropic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,39 @@ torch_dtype: null

# Data training arguments
# For definitions, see: src/h4/training/config.py
dataset_mixer:
HuggingFaceH4/ultrafeedback_binarized: 1.0
HuggingFaceH4/cai-conversation-harmless: 1.0
dataset_splits:
- train_prefs
- test_prefs
preprocessing_num_workers: 12
dataset_mixture:
datasets:
- id: HuggingFaceH4/ultrafeedback_binarized
config: default
split: train_prefs
columns:
- chosen
- rejected
weight: 1.0
- id: HuggingFaceH4/ultrafeedback_binarized
config: default
split: test_prefs
columns:
- chosen
- rejected
weight: 1.0
- id: HuggingFaceH4/cai-conversation-harmless
config: default
split: train_prefs
columns:
- chosen
- rejected
weight: 1.0
- id: HuggingFaceH4/cai-conversation-harmless
config: default
split: test_prefs
columns:
- chosen
- rejected
weight: 1.0
test_split_size: 3000
seed: 0
dataset_num_proc: 12

# DPOTrainer arguments
bf16: true
Expand Down
24 changes: 17 additions & 7 deletions recipes/constitutional-ai/sft/config_anthropic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,23 @@ attn_implementation: flash_attention_2

# Data training arguments
chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
dataset_mixer:
HuggingFaceH4/cai-conversation-harmless: 1.0
HuggingFaceH4/ultrachat_200k: 1.0
dataset_splits:
- train_sft
- test_sft
preprocessing_num_workers: 12
dataset_mixture:
datasets:
- id: HuggingFaceH4/cai-conversation-harmless
config: default
split: train_sft
columns:
- messages
weight: 1.0
- id: HuggingFaceH4/ultrachat_200k
config: default
split: test_sft
columns:
- messages
weight: 1.0
test_split_size: 1000
seed: 0
dataset_num_proc: 12

# SFT trainer config
bf16: true
Expand Down
24 changes: 17 additions & 7 deletions recipes/constitutional-ai/sft/config_grok.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,23 @@ attn_implementation: flash_attention_2

# Data training arguments
chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
dataset_mixer:
HuggingFaceH4/grok-conversation-harmless: 0.15
HuggingFaceH4/ultrachat_200k: 1.0
dataset_splits:
- train_sft
- test_sft
preprocessing_num_workers: 12
dataset_mixture:
datasets:
- id: HuggingFaceH4/grok-conversation-harmless
config: default
split: train_sft
columns:
- messages
weight: 1.0
- id: HuggingFaceH4/ultrachat_200k
config: default
split: test_sft
columns:
- messages
weight: 1.0
test_split_size: 1000
seed: 0
dataset_num_proc: 12

# SFT trainer config
bf16: true
Expand Down
43 changes: 0 additions & 43 deletions recipes/gpt2-nl/README.md

This file was deleted.

Loading