You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
bugSomething isn't workingpendingThis problem is yet to be addressed
1 participant
Converted from issue
This discussion was converted from issue #9757 on January 12, 2026 11:01.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Reminder
System Info
训练了3个epoch,训练到后面loss都降到0了,但是我相同的数据使用sft微调就没问题
训练的配置文件如下:
model
model_name_or_path:
image_max_pixels: 1894400 # our:18503232
video_max_pixels: 16384
trust_remote_code: true
method
stage: dpo
do_train: true
do_predict: false
freeze_vision_tower: False
freeze_multi_modal_projector: False
finetuning_type: lora # lora,freeze,full
lora_rank: 32
lora_alpha: 64
lora_target: all
enable_liger_kernel: True
dataset
dataset: good_with_super_severe_dpo
template: qwen3_vl
cutoff_len: 10000
max_samples: 10000000
overwrite_cache: True
preprocessing_num_workers: 64
dataloader_num_workers: 16
output
output_dir:
logging_steps: 10 # 10
save_steps: 2000 # 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false
#train
per_device_train_batch_size: 4
gradient_accumulation_steps: 16
num_train_epochs: 3.0
lr_scheduler_type: cosine
learning_rate: 1.0e-4
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
部分训练数据如下:
text"
"conversations": [
{
"from": "human",
"value": "
}
],
"chosen": {
"from": "gpt",
"value": "{"passed": "是", "level": 0, "class": 5}"
},
"rejected": {
"from": "gpt",
"value": "{"passed": "否", "level": 3, "class": 5}"
},
"images": [
"path"
]
但是训练之后推理得到如下的结果,pred是大模型推理出来的,文本甚至都没生成全:
"pred": "\n\n\n\n{"passed": "是", 0",
"label": "\n\n\n\n{"passed": "是", "level": 0, "class": 4}\n"
有没有大佬知道这个问题怎么解决
Reproduction
Others
No response
Beta Was this translation helpful? Give feedback.
All reactions