Skip to content

Commit fd4ea9b

Browse files
committed
[template] Support qwen3 omni mixed data (#6196)
1 parent c232caa commit fd4ea9b

File tree

4 files changed

+303
-90
lines changed

4 files changed

+303
-90
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# 2 * 60GiB
2+
# mcore shell: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/multimodal/omni/moe.sh
3+
MAX_PIXELS=1003520 \
4+
NPROC_PER_NODE=2 \
5+
VIDEO_MAX_PIXELS=50176 \
6+
FPS_MAX_FRAMES=12 \
7+
CUDA_VISIBLE_DEVICES=0,1 \
8+
swift sft \
9+
--model Qwen/Qwen3-Omni-30B-A3B-Instruct \
10+
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#10000' \
11+
'AI-ModelScope/LaTeX_OCR:human_handwrite#5000' \
12+
'swift/VideoChatGPT:Generic#2000' \
13+
'speech_asr/speech_asr_aishell1_trainsets:validation#5000' \
14+
--split_dataset_ratio 0.01 \
15+
--load_from_cache_file true \
16+
--train_type lora \
17+
--torch_dtype bfloat16 \
18+
--num_train_epochs 1 \
19+
--per_device_train_batch_size 8 \
20+
--per_device_eval_batch_size 8 \
21+
--attn_impl flash_attn \
22+
--learning_rate 1e-4 \
23+
--lora_rank 8 \
24+
--lora_alpha 32 \
25+
--target_modules all-linear \
26+
--freeze_vit true \
27+
--freeze_aligner true \
28+
--padding_free true \
29+
--gradient_accumulation_steps 1 \
30+
--gradient_checkpointing true \
31+
--eval_steps 50 \
32+
--save_steps 50 \
33+
--save_total_limit 2 \
34+
--logging_steps 5 \
35+
--max_length 4096 \
36+
--output_dir output \
37+
--warmup_ratio 0.05 \
38+
--dataset_num_proc 4 \
39+
--deepspeed zero3 \
40+
--dataloader_num_workers 4

0 commit comments

Comments
 (0)