-
Notifications
You must be signed in to change notification settings - Fork 8.4k
Description
Reminder
- I have read the above rules and searched the existing issues.
System Info
训练了3个epoch,训练到后面loss都降到0了,但是我相同的数据使用sft微调就没问题
训练的配置文件如下:
model
model_name_or_path:
image_max_pixels: 1894400 # our:18503232
video_max_pixels: 16384
trust_remote_code: true
method
stage: dpo
do_train: true
do_predict: false
freeze_vision_tower: False
freeze_multi_modal_projector: False
finetuning_type: lora # lora,freeze,full
lora_rank: 32
lora_alpha: 64
lora_target: all
enable_liger_kernel: True
dataset
dataset: good_with_super_severe_dpo
template: qwen3_vl
cutoff_len: 10000
max_samples: 10000000
overwrite_cache: True
preprocessing_num_workers: 64
dataloader_num_workers: 16
output
output_dir:
logging_steps: 10 # 10
save_steps: 2000 # 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
#train
per_device_train_batch_size: 4
gradient_accumulation_steps: 16
num_train_epochs: 3.0
lr_scheduler_type: cosine
learning_rate: 1.0e-4
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
部分训练数据如下:
"conversations": [
{
"from": "human",
"value": "text"
}
],
"chosen": {
"from": "gpt",
"value": "{"passed": "是", "level": 0, "class": 5}"
},
"rejected": {
"from": "gpt",
"value": "{"passed": "否", "level": 3, "class": 5}"
},
"images": [
"path"
]
但是训练之后推理得到如下的结果,pred是大模型推理出来的,文本甚至都没生成全:
"pred": "\n\n\n\n{"passed": "是", 0",
"label": "\n\n\n\n{"passed": "是", "level": 0, "class": 4}\n"
有没有大佬知道这个问题怎么解决
Reproduction
Put your message here.
Others
No response