Skip to content

Commit f87009c

Browse files
authored
Merge branch 'main' into train_by_deepspeed
2 parents c5dc8fc + 6cbc1b7 commit f87009c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+4621
-415
lines changed

.github/workflows/test.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
name: PR Test
2+
3+
on:
4+
pull_request:
5+
branches: [ main ]
6+
workflow_dispatch:
7+
8+
concurrency:
9+
group: pr-test-${{ github.ref }}
10+
cancel-in-progress: true
11+
12+
jobs:
13+
unit-test:
14+
if: (github.repository == 'sgl-project/SpecForge' || github.event_name == 'pull_request') &&
15+
github.event.pull_request.draft == false
16+
runs-on: [self-hosted]
17+
container:
18+
image: lmsysorg/sglang:dev
19+
options: --gpus all --shm-size=2g --rm -v /dev/shm
20+
env:
21+
CUDA_VISIBLE_DEVICES: 6,7
22+
steps:
23+
- name: Checkout code
24+
uses: actions/checkout@v4
25+
26+
- name: Install dependencies
27+
run: |
28+
pip install -e .
29+
30+
- name: Run test
31+
timeout-minutes: 10
32+
run: |
33+
python -m unittest discover -s ./tests -p "test_*.py"

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,8 +195,7 @@ cache/
195195
outputs/
196196
wandb/
197197
.idea
198+
.vscode/
198199

199200
# macOS
200201
.DS_Store
201-
202-
.vscode/

README.md

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,11 @@ You need to specify the following arguments:
120120

121121
### 🤩 Prepare your own dataset
122122

123-
Besides the provided ShareGPT/Ultrachat datasets, you can also prepare your own dataset. You should prepare the dataset in jsonl format and the schema should look like this:
123+
Besides the provided ShareGPT/Ultrachat datasets, you can also prepare your own dataset. We support two formats:
124+
125+
#### Option 1: Conversation Format
126+
127+
You should prepare the dataset in jsonl format and the schema should look like this:
124128

125129
```json
126130
{
@@ -134,6 +138,30 @@ Besides the provided ShareGPT/Ultrachat datasets, you can also prepare your own
134138
}
135139
```
136140

141+
#### Option 2: Pre-formatted Text Format
142+
143+
If you already have conversations formatted with a specific chat template, you can use the pre-formatted text directly:
144+
145+
```json
146+
{
147+
"id": "xxxx",
148+
"text": "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\nHi there!<|im_end|>\n"
149+
}
150+
```
151+
152+
This format is useful when you have pre-formatted prompts that were used during training of the target model and have raw generations from the target model.
153+
154+
To use pre-formatted datasets, add the `--is-preformatted` flag to your training command. Note that the `--chat-template` parameter is still needed and should match the template used in your pre-formatted text, as it is used to identify user/assistant tokens to determine the assistant spans and generate the corresponding loss mask.
155+
156+
```bash
157+
torchrun --standalone --nproc_per_node 8 \
158+
scripts/train_eagle3_online.py \
159+
--is-preformatted \
160+
--chat-template qwen \
161+
--train-data-path ./your_preformatted_dataset.jsonl \
162+
# ... other arguments
163+
```
164+
137165
Once you have the `jsonl` file ready, you can go straight for online training or hidden states generation for offline training.
138166

139167
If you have multiple datasets, you can just merge them into the one jsonl file. For example, you can do something like this
@@ -256,7 +284,7 @@ When `tp_size` is greater than 1, the script will automatically load the distrib
256284

257285
#### Customize Draft Model
258286

259-
If you want to change the draft model configuration, you can write your own configuration file and pass its path to the `--draft-model-config` argument. If you wish to serve your customized draft model with SGLang, make sure you implement the draft model in SGLang as well and the architecture name must match. To implement your own draft model, you can create a new class and inherit it from the `Eagle3DraftModel` class in the `specforge.modeling.draft.base.py` file.
287+
If you want to change the draft model configuration, you can write your own configuration file and pass its path to the `--draft-model-config` argument. Or, if you do not provide the `--draft-model-config` argument, the script will automatically generate the draft model configuration based on the target model configuration. If you wish to serve your customized draft model with SGLang, make sure you implement the draft model in SGLang as well and the architecture name must match. To implement your own draft model, you can create a new class and inherit it from the `Eagle3DraftModel` class in the `specforge.modeling.draft.base.py` file.
260288

261289

262290
```python

0 commit comments

Comments
 (0)