Add instructions for the post-training steps (#222)

cmpatino · web-flow · commit 925acdfe4b2b · 2025-07-29T12:10:21.000+02:00
* Add instructions for the post-training steps

* Minor grammar and spaces corrections

* Specify GAS parameter to have the correct EBS on 1 node
diff --git a/recipes/smollm3/README.md b/recipes/smollm3/README.md
@@ -1,10 +1,20 @@
-
 # Instructions to train SmolLM3-3B
 
-We are open-sourcing all the artifacts to train [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B). You can find the configuration files for the three post-training stages in the [`sft`](https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm3/sft) and [`dpo`](https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm3/dpo) directories. 
-
-We are currently working on the code release, so this README will contain the instructions to run training after we release the code on the week of July 14, 2025. 
+We are open-sourcing all the artifacts to train [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B). You can find the configuration files for the three post-training stages (mid-training, SFT, and DPO) in the [`sft`](https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm3/sft) and [`dpo`](https://github.com/huggingface/alignment-handbook/tree/main/recipes/smollm3/dpo) directories.
 
 ## Setup
 
-[WIP]
+Make sure you followed the installation instructions in the [README.md](README.md) file. We tested the training setup with 8 GPUs (80GB of VRAM) to train the full model.
+
+## Full training examples
+
+```shell
+# Step 1 - Mid-Training
+ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/sft.py --config recipes/smollm3/sft/mid.yaml --gradient_accumulation_steps 16
+
+# Step 2 - SFT
+ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/sft.py --config recipes/smollm3/sft/sft.yaml --gradient_accumulation_steps 16
+
+# Step 2 - DPO
+ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero3.yaml scripts/dpo.py --config recipes/smollm3/dpo/apo.yaml --gradient_accumulation_steps 4
+```
diff --git a/recipes/smollm3/sft/sft.yaml b/recipes/smollm3/sft/sft.yaml
@@ -1,4 +1,4 @@
-# Config for 8 nodes
+# Config for 8 nodes with GBS 128
 # Model arguments
 model_name_or_path: HuggingFaceTB/SmolLM3-3B-checkpoints
 model_revision: it-mid-training

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-# Config for 8 nodes`
	`1`	`+# Config for 8 nodes with GBS 128`
`2`	`2`	`# Model arguments`
`3`	`3`	`model_name_or_path: HuggingFaceTB/SmolLM3-3B-checkpoints`
`4`	`4`	`model_revision: it-mid-training`