Upgrade data mixer, deps, and scripts #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

lewtun merged 26 commits into main from upgrade

Jul 24, 2025

.github/workflows/build_documentation.yml

This file was deleted.

.github/workflows/build_pr_documentation.yml

This file was deleted.

.github/workflows/tests.yml

-Original file line number
+Diff line change
@@ Expand Up / @@ -26,6 +26,6 @@ jobs: @@
           - name: Install dependencies
             run: |
               python -m pip install --upgrade pip
-              python -m pip install ".[dev, torch]"
+              python -m pip install ".[dev]"
           - name: Run unit tests
             run: HF_TOKEN=$HF_TOKEN pytest -sv tests/

.github/workflows/upload_pr_documentation.yml

This file was deleted.

CITATION.cff

-Original file line number
+Diff line change
@@ Expand Up / @@ -26,4 +26,4 @@ authors: @@
         family-names: Wolf
     repository-code: 'https://github.com/huggingface/alignment-handbook'
     license: Apache-2.0
-    version: 0.3.0.dev0
+    version: 0.4.0.dev0

README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155
  
    The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.

    ## News 🗞️

    * **July 24, 2025**: We release the full [post-training recipe](recipes/smollm2/README.md) behind SmolLM3-3B: a state-of-the-art hybrid reasoning model 💭

    * **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for fine-tuning SmolLM2-Instruct.

    * **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md)  to fine-tuning small LLMs 💻

    * **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁

    @@ -60,32 +61,35 @@ The initial release of the handbook will focus on the following techniques:
  
    ## Installation instructions

    To run the code in this project, first, create a Python virtual environment using e.g. Conda:

    To run the code in this project, first, create a Python virtual environment using e.g. `uv`:

    ```shell

    conda create -n handbook python=3.10 && conda activate handbook

    uv venv handbook --python 3.11 && source handbook/bin/activate && uv pip install --upgrade pip

    ```

    Next, install PyTorch `v2.1.2` - the precise version is important for reproducibility! Since this is hardware-dependent, we

    direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/).

    > [!TIP]

    > To install `uv`, follow the [UV Installation Guide](https://docs.astral.sh/uv/getting-started/installation/).

    Next, install PyTorch `v2.6.0` 

    ```shell

    uv pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu126

    ```

    Note that the precise version is important for reproducibility! Since this is hardware-dependent, we also direct you to the [PyTorch Installation Page](https://pytorch.org/get-started/locally/).

    You can then install the remaining package dependencies as follows:

    ```shell

    git clone https://github.com/huggingface/alignment-handbook.git

    cd ./alignment-handbook/

    python -m pip install .

    uv pip install .

    ```

    You will also need Flash Attention 2 installed, which can be done by running:

    ```shell

    python -m pip install flash-attn --no-build-isolation

    uv pip install "flash-attn==2.7.4.post1" --no-build-isolation

    ```

    > **Note**

    > If your machine has less than 96GB of RAM and many CPU cores, reduce the `MAX_JOBS` arguments, e.g. `MAX_JOBS=4 pip install flash-attn --no-build-isolation`

    Next, log into your Hugging Face account as follows:

    ```shell

    @@ -106,7 +110,6 @@ You can now check out the `scripts` and `recipes` directories for instructions o
  
    ├── LICENSE

    ├── Makefile                    <- Makefile with commands like `make style`

    ├── README.md                   <- The top-level README for developers using this project

    ├── chapters                    <- Educational content to render on hf.co/learn

    ├── recipes                     <- Recipe configs, accelerate configs, slurm scripts

    ├── scripts                     <- Scripts to train and evaluate chat models

    ├── setup.cfg                   <- Installation config (mostly used for configuring code quality & tests)

    @@ -121,10 +124,10 @@ If you find the content of this repo useful in your work, please cite it as foll
  
    ```bibtex

    @software{Tunstall_The_Alignment_Handbook,

      author = {Tunstall, Lewis and Beeching, Edward and Lambert, Nathan and Rajani, Nazneen and Huang, Shengyi and Rasul, Kashif and Bartolome, Alvaro and M. Rush, Alexander and Wolf, Thomas},

      author = {Tunstall, Lewis and Beeching, Edward and Lambert, Nathan and Rajani, Nazneen and Huang, Shengyi and Rasul, Kashif and Bartolome, Alvaro, and Patiño, M. Carlos and M. Rush, Alexander and Wolf, Thomas},

      license = {Apache-2.0},

      title = {{The Alignment Handbook}},

      url = {https://github.com/huggingface/alignment-handbook},

      version = {0.3.0.dev0}

      version = {0.4.0.dev0}

    }

    ```

chapters/en/_toctree.yml

This file was deleted.

chapters/en/chapter0/introduction.mdx

This file was deleted.

0 recipes/accelerate_configs/multi_gpu.yaml → recipes/accelerate_configs/ddp.yaml

File renamed without changes.

recipes/accelerate_configs/fsdp_qlora.yaml

This file was deleted.

...s/accelerate_configs/deepspeed_zero3.yaml → recipes/accelerate_configs/zero3.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -19,4 +19,4 @@ same_network: true @@
     tpu_env: []
     tpu_use_cluster: false
     tpu_use_sudo: false
-    use_cpu: false
+    use_cpu: false

recipes/constitutional-ai/README.md

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -11,10 +11,10 @@ This repo includes the recipe for training the following models:
  
    You will require 8 GPUs (80GB of VRAM) to train the full model.

    ```shell

    # Step 1 - SFT

    ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/constitutional-ai/sft/config_{grok,anthropic}.yaml

    ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/sft.py --config recipes/constitutional-ai/sft/config_{grok,anthropic}.yaml

    # Step 2 - DPO

    ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/constitutional-ai/dpo/config_anthropic.yaml

    ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/dpo.py --config recipes/constitutional-ai/dpo/config_anthropic.yaml

    # Note that we did not include the DPO recipe for grok, as that model's seems overtrained and too snarky.

    ```

recipes/constitutional-ai/dpo/config_anthropic.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -4,13 +4,39 @@ torch_dtype: null @@
     # Data training arguments
     # For definitions, see: src/h4/training/config.py
-    dataset_mixer:
-      HuggingFaceH4/ultrafeedback_binarized: 1.0
-      HuggingFaceH4/cai-conversation-harmless: 1.0
-    dataset_splits:
-    - train_prefs
-    - test_prefs
-    preprocessing_num_workers: 12
+    dataset_mixture:
+      datasets:
+        - id: HuggingFaceH4/ultrafeedback_binarized
+          config: default
+          split: train_prefs
+          columns:
+            - chosen
+            - rejected
+          weight: 1.0
+        - id: HuggingFaceH4/ultrafeedback_binarized
+          config: default
+          split: test_prefs
+          columns:
+            - chosen
+            - rejected
+          weight: 1.0
+        - id: HuggingFaceH4/cai-conversation-harmless
+          config: default
+          split: train_prefs
+          columns:
+            - chosen
+            - rejected
+          weight: 1.0
+        - id: HuggingFaceH4/cai-conversation-harmless
+          config: default
+          split: test_prefs
+          columns:
+            - chosen
+            - rejected
+          weight: 1.0
+      test_split_size: 3000
+      seed: 0
+    dataset_num_proc: 12
     # DPOTrainer arguments
     bf16: true
@@ Expand Down @@

recipes/constitutional-ai/sft/config_anthropic.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -6,13 +6,23 @@ attn_implementation: flash_attention_2 @@
     # Data training arguments
     chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
-    dataset_mixer:
-      HuggingFaceH4/cai-conversation-harmless: 1.0
-      HuggingFaceH4/ultrachat_200k: 1.0
-    dataset_splits:
-    - train_sft
-    - test_sft
-    preprocessing_num_workers: 12
+    dataset_mixture:
+      datasets:
+        - id: HuggingFaceH4/cai-conversation-harmless
+          config: default
+          split: train_sft
+          columns:
+            - messages
+          weight: 1.0
+        - id: HuggingFaceH4/ultrachat_200k
+          config: default
+          split: test_sft
+          columns:
+            - messages
+          weight: 1.0
+      test_split_size: 1000
+      seed: 0
+    dataset_num_proc: 12
     # SFT trainer config
     bf16: true
@@ Expand Down @@

recipes/constitutional-ai/sft/config_grok.yaml

-Original file line number
+Diff line change
@@ Expand Up / @@ -6,13 +6,23 @@ attn_implementation: flash_attention_2 @@
     # Data training arguments
     chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
-    dataset_mixer:
-      HuggingFaceH4/grok-conversation-harmless: 0.15
-      HuggingFaceH4/ultrachat_200k: 1.0
-    dataset_splits:
-    - train_sft
-    - test_sft
-    preprocessing_num_workers: 12
+    dataset_mixture:
+      datasets:
+        - id: HuggingFaceH4/grok-conversation-harmless
+          config: default
+          split: train_sft
+          columns:
+            - messages
+          weight: 1.0
+        - id: HuggingFaceH4/ultrachat_200k
+          config: default
+          split: test_sft
+          columns:
+            - messages
+          weight: 1.0
+      test_split_size: 1000
+      seed: 0
+    dataset_num_proc: 12
     # SFT trainer config
     bf16: true
@@ Expand Down @@

recipes/gpt2-nl/README.md

This file was deleted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade data mixer, deps, and scripts #221

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!