modal-labs
diff --git a/‎docs/basic_usage/gpt-oss.md‎ b/‎docs/basic_usage/gpt-oss.md‎
diff --git a/‎docs/basic_usage/llama3.md‎ b/‎docs/basic_usage/llama3.md‎
diff --git a/‎docs/basic_usage/llama4.md‎ b/‎docs/basic_usage/llama4.md‎
diff --git a/‎docs/basic_usage/qwen3.md‎ b/‎docs/basic_usage/qwen3.md‎
diff --git a/‎documents/run_eagle3_llama.md‎ ‎docs/examples/llama3-eagle3.md‎documents/run_eagle3_llama.md renamed to docs/examples/llama3-eagle3.md
Lines changed: 21 additions & 14 deletions b/‎documents/run_eagle3_llama.md‎ ‎docs/examples/llama3-eagle3.md‎documents/run_eagle3_llama.md renamed to docs/examples/llama3-eagle3.md
Lines changed: 21 additions & 14 deletions
diff --git a/‎docs/index.rst‎
Lines changed: 6 additions & 1 deletion b/‎docs/index.rst‎
Lines changed: 6 additions & 1 deletion
@@ -1,28 +1,34 @@
-# Preproducing the draft model in the EAGLE3 paper
+# Eagle3 for Llama3
 
-This documents shows how to reproduce the training process of EAGLE3 paper. The script is in `examples/run_llama3_eagle3_sgl_online.sh`. This documents is a walk through of the script and explains all the middle points.
 
-## Step0. Prepare environment
+## Introduction
 
-We suggest to use virtual environment to make sure all the dependency can be correctly installed. If you want to use `python>=3.12`, please set `export SETUPTOOLS_USE_DISTUTILS=local`.
+This document provides a step-by-step guide to reproducing the training process described in the EAGLE3 paper, using the script `examples/run_llama3_eagle3_sgl_online.sh`. We will walk through the script and explain each key step along the way.
 
-```
+## Workflow
+
+### Step 1. Prepare environment
+
+We suggest to use a virtual environment to make sure that all the dependencies can be correctly installed. If you want to use `python>=3.12`, please set `export SETUPTOOLS_USE_DISTUTILS=local`.
+
+```shell
 uv venv --python 3.11
 source .venv/bin/activate
 cd PATH-TO-SpecForge
 uv pip install -r requirements.txt
 uv pip install -v .
 ```
 
-After completing these steps, open a Python shell and run:
-```python
-import specforge
+After completing these steps, you can check if the installation is successful by running the following command. You should not see any error if the installation is successful.
+
+```shell
+python -c "import specforge"
 ```
-If the import succeeds without errors, Step 0 is complete.
 
-## Step1. Prepare Model & Dataset
+### Step 2. Prepare Model & Dataset
+
+Next, we can start preparing the model and dataset. First, use these commands to download the model and the dataset.
 
-First, use these command to download the model and the dataset.
 ```shell
 hf download meta-llama/Llama-3.1-8B-Instruct
 hf download Aeala/ShareGPT_Vicuna_unfiltered --repo-type dataset
@@ -60,6 +66,7 @@ python scripts/generate_data_by_target.py \
 ```
 
 After completing these steps, you can review the error entries in `error.jsonl`. Most of them will likely be `request timeout`. You can then decide whether you want to regenerate those samples. In my case, I chose not to, so I simply deleted error.jsonl before uploading to Hugging Face. The following command is used:
+
 ```shell
 hf repo create zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B --type dataset
 hf upload /YOUR/PATH/Llama-3.1-8B-Instruct/generated-dataset/ultrachat-llama-3.1-8b-instruct --commit-message "generated dataset by Llama3.1-8B"
@@ -72,7 +79,6 @@ ds.to_json("merged.jsonl", orient="records", lines=True)
 ds = ds.train_test_split(test_size=0.05)
 train_ds = ds["train"]
 test_ds = ds["test"]
-
 ```
 
 Alternatively, For `meta-llama/Llama-3.1-8B-Instruct`, you can use the dataset we generated: [zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B](https://huggingface.co/datasets/zhuyksir/Ultrachat-Sharegpt-Llama3.1-8B).
@@ -99,6 +105,7 @@ Second, we need to pre-build the cache for training.
     - Red text indicates tokens where `loss_mask == 0 (typically user input and system prompt)`. Since the goal is to train the draft model only on the target model’s output, user text must be masked out. In other words, only tokens generated by the target model should contribute to the loss.
 
 - You might see this warning. `WARNING: No assistant response spans found in the conversation text.`This occurs when, during data generation, an error causes a sample to contain only user inputs without any assistant responses. You can safely ignore this warning—the loss mask for such samples is set entirely to zero.
+
 ```shell
 python scripts/build_eagle3_dataset_cache.py \
     --target-model-path $MODEL_PATH \
@@ -111,7 +118,7 @@ python scripts/build_eagle3_dataset_cache.py \
     --view-train-data 1 2
 ```
 
-## Step2. Start Training
+### Step 3. Start Training
 
 Use the following script to train.
 
@@ -146,7 +153,7 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun \
     --report-to wandb
 ```
 
-## Step3. benchmark
+### Step 4. benchmark
 
 For `Llama3.1-8B`, we add a system prompt to all training data, following the approach used in the official repository. Consequently, when benchmarking, we should also include this system prompt to obtain the full accept length. Please uncomment the corresponding line and add the system prompt.
 
 
@@ -18,7 +18,12 @@ SpecForge is an ecosystem project developed by the SGLang team. It is a framewor
    basic_usage/data_preparation.md
    basic_usage/training.md
    basic_usage/benchmarking.md
-   basic_usage/llama3.md
+
+.. toctree::
+   :maxdepth: 1
+   :caption: Examples
+
+   examples/llama3-eagle3.md
 
 .. toctree::
    :maxdepth: 1