Skip to content

qwen3-vl使用DPO训练过程loss降到0了,但是训练后的模型推理很奇怪 #9757

@Xinyu-lab

Description

@Xinyu-lab

Reminder

  • I have read the above rules and searched the existing issues.

System Info

训练了3个epoch,训练到后面loss都降到0了,但是我相同的数据使用sft微调就没问题
训练的配置文件如下:

model

model_name_or_path:

image_max_pixels: 1894400 # our:18503232
video_max_pixels: 16384
trust_remote_code: true

method

stage: dpo
do_train: true
do_predict: false
freeze_vision_tower: False
freeze_multi_modal_projector: False
finetuning_type: lora # lora,freeze,full
lora_rank: 32
lora_alpha: 64
lora_target: all
enable_liger_kernel: True

dataset

dataset: good_with_super_severe_dpo
template: qwen3_vl
cutoff_len: 10000
max_samples: 10000000
overwrite_cache: True
preprocessing_num_workers: 64
dataloader_num_workers: 16

output

output_dir:
logging_steps: 10 # 10
save_steps: 2000 # 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

#train
per_device_train_batch_size: 4
gradient_accumulation_steps: 16
num_train_epochs: 3.0
lr_scheduler_type: cosine
learning_rate: 1.0e-4
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

部分训练数据如下:
"conversations": [
{
"from": "human",
"value": "text"
}
],
"chosen": {
"from": "gpt",
"value": "{"passed": "是", "level": 0, "class": 5}"
},
"rejected": {
"from": "gpt",
"value": "{"passed": "否", "level": 3, "class": 5}"
},
"images": [
"path"
]

但是训练之后推理得到如下的结果,pred是大模型推理出来的,文本甚至都没生成全:
"pred": "\n\n\n\n{"passed": "是", 0",
"label": "\n\n\n\n{"passed": "是", "level": 0, "class": 4}\n"
有没有大佬知道这个问题怎么解决

Reproduction

Put your message here.

Others

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already exists

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions