-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi, authors, thanks for your awesome work!
I'm attempting to train Qwen/Qwen2.5-VL-3B-Instruct using the provided training script, but I've encountered several issues that I'd like to clarify:
Training Script
#!/bin/bash
setting='dozen_vsr_qwen_add_grounded_reasoning_single_turn_think_rethink'
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export WANDB_PROJECT=$setting
# Load config variables
source scripts/train_base_config.sh
# Run the training script with DeepSpeed
python -m accelerate.commands.launch \
--config_file ./accelerate_configs/deepspeed_zero2.yaml \
--main_process_port 20092 \
grpo-gr/GRPO_GR.py \
--train_data_path ./GRIT_data/tallyqa_train_10.jsonl,./GRIT_data/vsr_cot_train_10.jsonl \
--train_image_folder_path ./GRIT_data/tallyqa,./GRIT_data/vsr \
--eval_data_path ./GRIT_data/vsr_val.jsonl,./GRIT_data/mme_val.jsonl,./GRIT_data/tallyqa_val.jsonl,./GRIT_data/gqa_val.jsonl,./GRIT_data/mathvista_mini_val.jsonl,./GRIT_data/ovd_position_val.jsonl,./GRIT_data/ovd_relationship_val.jsonl,./GRIT_data/ovd_negation_val.jsonl \
--eval_image_folder_path ./GRIT_data/vsr,./GRIT_data/mme,./GRIT_data/tallyqa,./GRIT_data/gqa,./GRIT_data/mathvista_mini,./GRIT_data/ovd_position,./GRIT_data/ovd_relationship,./GRIT_data/ovd_negation \
--setting $setting \
--max_turns 1 \
--output_dir output/$setting \
--hub_model_id $setting \
$COMMON_ARGS \
--eval_steps 50 \
--save_steps 50 \
--num_train_epochs 500 \
--lr_scheduler_type cosine \
--per_device_eval_batch_size 81. Dataset Issues
MME Dataset
Most datasets can be downloaded normally, but for the MME dataset, when I try to download from the repository path specified in the paper (link), I find that the image names in the downloaded files don't match the names listed in mme_val.jsonl.
Missing Label Files
The following label files are missing:
./GRIT_data/ovd_relationship_val.jsonl./GRIT_data/ovd_negation_val.jsonl
Could you please provide these files or clarify how to obtain them?
2. Flash Attention Issues
During initial training, I encounter the following warning/error:
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
...The specific error indicates that float16 is not supported. I resolved this by manually specifying torch_dtype=torch.bfloat16 during model initialization. Did you encounter this issue during your training? What's the recommended approach to handle this?
GRIT/grpo-gr/GRPO_GRTrainer.py
Lines 233 to 234 in fd08d57
| if "qwen" in model_id.lower(): | |
| model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model, **model_init_kwargs) |
3. Training Hyperparameters
I'd like to confirm a few things about the training parameters:
-
Epochs: Is
--num_train_epochs 500an experimental parameter? This seems quite high - is this intentional? -
Batch Size & Memory: When training on 48GB VRAM, I can only set
per_device_train_batch_sizeto 1, otherwise I get OOM errors. Is this normal? If the batch size can only be 1, should the learning rate be scaled accordingly? What would be the recommended values? -
Other Parameters: Are the other hyperparameters in the script reasonable for this model size and task?
4. Demo Environment
Regarding the gradio_qwen.py mentioned on the GitHub page, where can I find this file? It doesn't seem to be included in the current repository.
Environment:
- Model: Qwen/Qwen2.5-VL-3B-Instruct
- GPU: 8x GPUs with 48GB VRAM each
- Framework: DeepSpeed ZeRO-2
5. Logs
Also, It's very weird that reward scores always get zero.
Any guidance on these issues would be greatly appreciated. Thank you again for your work on this project!
