Skip to content

Commit 4f7ae50

Browse files
authored
Enhance Qwen3 VL Best Pracitice (#695)
* Support Qwen3-Next-80B-A3B * Update qwen3-vl ReadMe * Update qwen3-vl ReadMe
1 parent 8d5b79c commit 4f7ae50

File tree

4 files changed

+132
-40
lines changed

4 files changed

+132
-40
lines changed

examples/images/qwen3_vl_demo.jpeg

485 KB
Loading

examples/images/qwen3_vl_loss.png

241 KB
Loading

examples/qwen3_vl/README.md

Lines changed: 76 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,17 @@
44
* [安装](#安装)
55
* [数据集&模型下载](#数据集和模型下载)
66
* [Megatron-Core模型训练流程](#Megatron-Core模型训练流程)
7-
* [模型格式转换](#Megatron-Core模型格式转换)
8-
* [继续预训练](#预训练示例)
7+
* [模型格式转换](#模型格式转换)
8+
* [模型微调](#Qwen3-VL-30B-A3B微调示例)
9+
* [模型评测](#Qwen3-VL-30B-A3B评测示例)
910

1011
## 安装
1112

1213
请在阿里云人工智能平台PAI产品中填写专属镜像地址: `dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:25.01`
1314

14-
然后升级`transformers``multi-storage-client`的版本
15+
然后安装modelscope并升级`transformers``multi-storage-client`的版本
1516
```
17+
pip install modelscope==1.30.0
1618
pip install transformers==4.57.1
1719
pip install -U multi-storage-client
1820
```
@@ -27,9 +29,7 @@ git clone --recurse-submodules https://github.com/alibaba/Pai-Megatron-Patch.git
2729
```bash
2830
cd /mnt
2931
mkdir qwen3-vl-ckpts
30-
cd qwen3-vl-ckpts
31-
wget https://atp-modelzoo-wlcb-pai.oss-cn-wulanchabu.aliyuncs.com/release/models/pai-megatron-patch/qwen-ckpts/Qwen3-VL-4B-Instruct.tar
32-
tar -xvf Qwen3-VL-4B-Instruct.tar
32+
modelscope download --model Qwen/Qwen3-VL-30B-A3B-Instruct --local_dir Qwen3-VL-30B-A3B-Instruct
3333
cd ..
3434

3535
mkdir llava-datasets
@@ -67,10 +67,10 @@ tar -zxf wds.tgz
6767
6868
6969
## Megatron-Core模型训练流程
70-
### Megatron-Core模型格式转换
71-
当前qwen2.5-VL已升级至`torch_dist`格式权重训练,为了进行权重转换,需要传入的参数列表如下
70+
### 模型格式转换
71+
当前qwen3-VL已升级至`torch_dist`格式权重训练,为了进行权重转换,需要传入的参数列表如下
7272
```
73-
MODEL_SIZE=$1 # 模型大小,3B, 7B, 32B, 72B
73+
MODEL_SIZE=$1 # 模型大小,4B, 8B, A3B, A22B
7474
LOAD_DIR=$2 # 源权重路径
7575
SAVE_DIR=$3 # 目标权重路径
7676
MG2HF=$4 # 转换方向 可选: true, false
@@ -83,31 +83,18 @@ HF_DIR=$7 # HF权重路径(mcore2hf时必须提供)
8383
```bash
8484
cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
8585
bash scripts/qwen3_vl/run_8xH20.sh \
86-
4B \
87-
/mnt/qwen3-vl-ckpts/Qwen3-VL-4B-Instruct \
88-
/mnt/qwen3-vl-ckpts/Qwen3-VL-4B-Instruct-to-mcore \
86+
A3B \
87+
/mnt/qwen3-vl-ckpts/Qwen3-VL-30B-A3B-Instruct \
88+
/mnt/qwen3-vl-ckpts/Qwen3-VL-30B-A3B-Instruct-to-mcore \
8989
false \
9090
true \
9191
bf16
9292
```
9393
94-
当您需要将训练好的checkpoint转换回huggingface格式用于推理时,执行
9594
96-
```bash
97-
cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
98-
bash scripts/qwen3_vl/run_8xH20.sh \
99-
4B \
100-
/mnt/qwen3-vl-ckpts/Qwen3-VL-4B-Instruct-to-mcore \
101-
/mnt/qwen3-vl-ckpts/Qwen3-VL-4B-Instruct-to-mcore-back \
102-
true \
103-
true \
104-
bf16 \
105-
/mnt/qwen3-vl-ckpts/Qwen3-VL-4B-Instruct
106-
```
107-
108-
### Megatron-Core预训练
95+
### Qwen3-VL-30B-A3B微调示例
10996
110-
#### 预训练命令描述
97+
#### 微调命令描述
11198
需要传入的参数列表如下:
11299
```bash
113100
ENV=$1 # 运行环境配置开关: dsw单机训练训练,dlc表示多机训练环境
@@ -136,26 +123,26 @@ LR_WARMUP_ITERS=${23} # 预热Iter数
136123
OUTPUT_BASEPATH=${24} # 训练输出日志文件路径
137124
```
138125
139-
#### 预训练示例
140-
使用以下命令启动对Qwen3-VL的继续预训练
126+
#### 微调示例
127+
使用以下命令启动对Qwen3-VL-30B-A3B的微调
141128
142129
```bash
143130
cd /workspace/Pai-Megatron-Patch/examples/qwen3_vl
144131
bash run_mcore_qwen.sh \
145132
dsw \
146-
4B \
133+
A3B \
147134
1 \
148-
32 \
135+
128 \
149136
1e-5 \
150137
1e-6 \
151-
2048 \
152-
2048 \
138+
16384 \
139+
16384 \
153140
bf16 \
154-
1 \
155-
1 \
156-
1 \
141+
4 \
142+
4 \
157143
1 \
158144
1 \
145+
4 \
159146
true \
160147
true \
161148
true \
@@ -164,9 +151,58 @@ false \
164151
100000 \
165152
/mnt/llava-datasets/LLaVA-Pretrain/wds \
166153
/mnt/llava-datasets/LLaVA-Pretrain/wds \
167-
/mnt/qwen2.5-vl-ckpts/Qwen3-VL-4B-Instruct-to-mcore \
168-
20000 \
169-
200 \
170-
/workspace/output_mcore_qwen3_vl_pretrain
154+
/mnt/qwen3-vl-ckpts/Qwen3-VL-30B-A3B-Instruct-to-mcore \
155+
500 \
156+
50 \
157+
/mnt/qwen3-vl-ckpts/sft_output_mcore_Qwen3-VL-30B-A3B
158+
```
159+
微调完成后的loss曲线如下图所示
160+
<p align="center">
161+
<picture>
162+
<img alt="patch" src="../images/qwen3_vl_loss.png" width=60%>
163+
</picture>
164+
</p>
165+
166+
### Qwen3-VL-30B-A3B评测示例
167+
168+
我们使用image caption任务来对多模态模型的效果进行评测,原始图像如下图所示:
169+
<p align="center">
170+
<picture>
171+
<img alt="patch" src="../images/qwen3_vl_demo.jpeg" width=60%>
172+
</picture>
173+
</p>
174+
175+
接着我们使用以下命令查看没有经过微调的Qwen3-VL-30B-A3B模型的输出效果
176+
```bash
177+
cd /workspace/Pai-Megatron-Patch/examples/qwen3_vl
178+
python inference.py --model-path /mnt/qwen3-vl-ckpts/Qwen3-VL-30B-A3B-Instruct
179+
171180
```
181+
输出结果如下:
182+
```bash
183+
['Of course. Here is a detailed description of the image.\n\nThis is a heartwarming and serene photograph capturing a tender moment between a woman and her dog on a beach at sunset.\n\n- **Main Subjects and Interaction**: The central focus is a woman and a large, light-colored dog, likely a yellow Labrador Retriever, sitting on the sand. The dog is sitting upright, and its right front paw is raised to meet the woman\'s hand in a "high-five" gesture. The woman is sitting cross-legged, smiling warmly at her dog, her face illuminated by the golden sunlight. This interaction conveys a strong bond of companionship, trust, and affection.\n\n- **Setting and Atmosphere**: The scene is set on a wide, sandy beach. In the background, the ocean stretches to the horizon, with a gentle wave cresting and breaking. The time of day is clearly sunset, as evidenced by the low, warm, golden light that bathes the entire scene. This light creates a soft, glowing effect, particularly on the woman\'s hair and the edges of the dog, and casts a warm hue over the sand. The sky is a bright, hazy white, indicating the sun is just below the horizon.\n\n- **Details and Composition**: The woman is wearing a black and white plaid flannel shirt over dark pants. She has a white watch on her left wrist. The dog is wearing a blue harness decorated with small, colorful paw prints. A red leash lies on the sand near them. The composition places the subjects slightly off-center, allowing the vastness of the beach and ocean to create a sense of peace and openness. The overall mood is one of tranquility, joy, and the simple pleasure of a shared moment with a beloved pet.']
184+
```
185+
接下来,我们使用以下命令查看经过微调的Qwen3-VL-30B-A3B模型的输出效果
186+
首先需要将微调后模型转换成HF格式,命令如下:
187+
```bash
188+
cd /workspace/Pai-Megatron-Patch/toolkits/distributed_checkpoints_convertor
189+
bash scripts/qwen3_vl/run_8xH20.sh \
190+
4B \
191+
/mnt/qwen3-vl-ckpts/sft_output_mcore_Qwen3-VL-30B-A3B \
192+
/mnt/qwen3-vl-ckpts/sft_output_hf_Qwen3-VL-30B-A3B \
193+
true \
194+
true \
195+
bf16 \
196+
/mnt/qwen3-vl-ckpts/Qwen3-VL-30B-A3B-Instruct
197+
```
198+
接着运行同样的推理
199+
```bash
200+
cd /workspace/Pai-Megatron-Patch/examples/qwen3_vl
201+
python inference.py --model-path /mnt/qwen3-vl-ckpts/sft_output_hf_Qwen3-VL-30B-A3B
172202

203+
```
204+
205+
输出结果参考如下:
206+
```bash
207+
['a woman is sitting on the beach with her dog, petting it']
208+
```

examples/qwen3_vl/inference.py

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
import argparse
2+
from transformers import AutoConfig, AutoProcessor, Qwen3VLForConditionalGeneration, Qwen3VLMoeForConditionalGeneration
3+
4+
def inference(HF_PATH):
5+
6+
hf_transformer_config = AutoConfig.from_pretrained(HF_PATH)
7+
8+
if hf_transformer_config.architectures[0] == "Qwen3VLForConditionalGeneration":
9+
model = Qwen3VLForConditionalGeneration.from_pretrained(
10+
HF_PATH, dtype="auto", device_map="auto"
11+
)
12+
elif hf_transformer_config.architectures[0] == "Qwen3VLMoeForConditionalGeneration":
13+
model = Qwen3VLMoeForConditionalGeneration.from_pretrained(
14+
HF_PATH, dtype="auto", device_map="auto"
15+
)
16+
17+
processor = AutoProcessor.from_pretrained(HF_PATH)
18+
19+
messages = [
20+
{
21+
"role": "user",
22+
"content": [
23+
{
24+
"type": "image",
25+
"image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
26+
},
27+
{"type": "text", "text": "Describe this image."},
28+
],
29+
}
30+
]
31+
32+
# Preparation for inference
33+
inputs = processor.apply_chat_template(
34+
messages,
35+
tokenize=True,
36+
add_generation_prompt=True,
37+
return_dict=True,
38+
return_tensors="pt"
39+
)
40+
41+
# Inference: Generation of the output
42+
generated_ids = model.generate(**inputs, max_new_tokens=1024)
43+
generated_ids_trimmed = [
44+
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
45+
]
46+
output_text = processor.batch_decode(
47+
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
48+
)
49+
print(output_text)
50+
51+
if __name__ == "__main__":
52+
parser = argparse.ArgumentParser()
53+
parser.add_argument("--model-path", default=None)
54+
args = parser.parse_args()
55+
56+
inference(args.model_path)

0 commit comments

Comments
 (0)