Skip to content

Commit 2cef428

Browse files
FrankLeeeeeroot
andauthored
refactored llama3 doc for github pages (sgl-project#247)
* refactored llama3 doc for github pages * polihs --------- Co-authored-by: root <root@g0574.prod.pyl1.voltagepark.net>
1 parent 94fdfa4 commit 2cef428

File tree

6 files changed

+27
-15
lines changed

6 files changed

+27
-15
lines changed

docs/basic_usage/gpt-oss.md

Whitespace-only changes.

docs/basic_usage/llama3.md

Whitespace-only changes.

docs/basic_usage/llama4.md

Whitespace-only changes.

docs/basic_usage/qwen3.md

Whitespace-only changes.
Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,34 @@
1-
# Preproducing the draft model in the EAGLE3 paper
1+
# Eagle3 for Llama3
22

3-
This documents shows how to reproduce the training process of EAGLE3 paper. The script is in `examples/run_llama3_eagle3_sgl_online.sh`. This documents is a walk through of the script and explains all the middle points.
43

5-
## Step0. Prepare environment
4+
## Introduction
65

7-
We suggest to use virtual environment to make sure all the dependency can be correctly installed. If you want to use `python>=3.12`, please set `export SETUPTOOLS_USE_DISTUTILS=local`.
6+
This document provides a step-by-step guide to reproducing the training process described in the EAGLE3 paper, using the script `examples/run_llama3_eagle3_sgl_online.sh`. We will walk through the script and explain each key step along the way.
87

9-
```
8+
## Workflow
9+
10+
### Step 1. Prepare environment
11+
12+
We suggest to use a virtual environment to make sure that all the dependencies can be correctly installed. If you want to use `python>=3.12`, please set `export SETUPTOOLS_USE_DISTUTILS=local`.
13+
14+
```shell
1015
uv venv --python 3.11
1116
source .venv/bin/activate
1217
cd PATH-TO-SpecForge
1318
uv pip install -r requirements.txt
1419
uv pip install -v .
1520
```
1621

17-
After completing these steps, open a Python shell and run:
18-
```python
19-
import specforge
22+
After completing these steps, you can check if the installation is successful by running the following command. You should not see any error if the installation is successful.
23+
24+
```shell
25+
python -c "import specforge"
2026
```
21-
If the import succeeds without errors, Step 0 is complete.
2227

23-
## Step1. Prepare Model & Dataset
28+
### Step 2. Prepare Model & Dataset
29+
30+
Next, we can start preparing the model and dataset. First, use these commands to download the model and the dataset.
2431

25-
First, use these command to download the model and the dataset.
2632
```shell
2733
hf download meta-llama/Llama-3.1-8B-Instruct
2834
hf download Aeala/ShareGPT_Vicuna_unfiltered --repo-type dataset
@@ -60,6 +66,7 @@ python scripts/generate_data_by_target.py \
6066
```
6167

6268
After completing these steps, you can review the error entries in `error.jsonl`. Most of them will likely be `request timeout`. You can then decide whether you want to regenerate those samples. In my case, I chose not to, so I simply deleted error.jsonl before uploading to Hugging Face. The following command is used:
69+
6370
```shell
6471
hf repo create zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B --type dataset
6572
hf upload /YOUR/PATH/Llama-3.1-8B-Instruct/generated-dataset/ultrachat-llama-3.1-8b-instruct --commit-message "generated dataset by Llama3.1-8B"
@@ -72,7 +79,6 @@ ds.to_json("merged.jsonl", orient="records", lines=True)
7279
ds = ds.train_test_split(test_size=0.05)
7380
train_ds = ds["train"]
7481
test_ds = ds["test"]
75-
7682
```
7783

7884
Alternatively, For `meta-llama/Llama-3.1-8B-Instruct`, you can use the dataset we generated: [zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B](https://huggingface.co/datasets/zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B).
@@ -99,6 +105,7 @@ Second, we need to pre-build the cache for training.
99105
- Red text indicates tokens where `loss_mask == 0 (typically user input and system prompt)`. Since the goal is to train the draft model only on the target model’s output, user text must be masked out. In other words, only tokens generated by the target model should contribute to the loss.
100106

101107
- You might see this warning. `WARNING: No assistant response spans found in the conversation text.`This occurs when, during data generation, an error causes a sample to contain only user inputs without any assistant responses. You can safely ignore this warning—the loss mask for such samples is set entirely to zero.
108+
102109
```shell
103110
python scripts/build_eagle3_dataset_cache.py \
104111
--target-model-path $MODEL_PATH \
@@ -111,7 +118,7 @@ python scripts/build_eagle3_dataset_cache.py \
111118
--view-train-data 1 2
112119
```
113120

114-
## Step2. Start Training
121+
### Step 3. Start Training
115122

116123
Use the following script to train.
117124

@@ -146,7 +153,7 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun \
146153
--report-to wandb
147154
```
148155

149-
## Step3. benchmark
156+
### Step 4. benchmark
150157

151158
For `Llama3.1-8B`, we add a system prompt to all training data, following the approach used in the official repository. Consequently, when benchmarking, we should also include this system prompt to obtain the full accept length. Please uncomment the corresponding line and add the system prompt.
152159

docs/index.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,12 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor
1818
basic_usage/data_preparation.md
1919
basic_usage/training.md
2020
basic_usage/benchmarking.md
21-
basic_usage/llama3.md
21+
22+
.. toctree::
23+
:maxdepth: 1
24+
:caption: Examples
25+
26+
examples/llama3-eagle3.md
2227

2328
.. toctree::
2429
:maxdepth: 1

0 commit comments

Comments
 (0)