Skip to content

Issues Training GRIT with Qwen2.5-VL-3B-Instruct #2

@David-19940718

Description

@David-19940718

Hi, authors, thanks for your awesome work!

I'm attempting to train Qwen/Qwen2.5-VL-3B-Instruct using the provided training script, but I've encountered several issues that I'd like to clarify:

Training Script

#!/bin/bash

setting='dozen_vsr_qwen_add_grounded_reasoning_single_turn_think_rethink'
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export WANDB_PROJECT=$setting

# Load config variables
source scripts/train_base_config.sh

# Run the training script with DeepSpeed
python -m accelerate.commands.launch \
    --config_file ./accelerate_configs/deepspeed_zero2.yaml \
    --main_process_port 20092 \
    grpo-gr/GRPO_GR.py \
    --train_data_path ./GRIT_data/tallyqa_train_10.jsonl,./GRIT_data/vsr_cot_train_10.jsonl \
    --train_image_folder_path ./GRIT_data/tallyqa,./GRIT_data/vsr \
    --eval_data_path ./GRIT_data/vsr_val.jsonl,./GRIT_data/mme_val.jsonl,./GRIT_data/tallyqa_val.jsonl,./GRIT_data/gqa_val.jsonl,./GRIT_data/mathvista_mini_val.jsonl,./GRIT_data/ovd_position_val.jsonl,./GRIT_data/ovd_relationship_val.jsonl,./GRIT_data/ovd_negation_val.jsonl \
    --eval_image_folder_path ./GRIT_data/vsr,./GRIT_data/mme,./GRIT_data/tallyqa,./GRIT_data/gqa,./GRIT_data/mathvista_mini,./GRIT_data/ovd_position,./GRIT_data/ovd_relationship,./GRIT_data/ovd_negation \
    --setting $setting \
    --max_turns 1 \
    --output_dir output/$setting \
    --hub_model_id $setting \
    $COMMON_ARGS \
    --eval_steps 50 \
    --save_steps 50 \
    --num_train_epochs 500 \
    --lr_scheduler_type cosine \
    --per_device_eval_batch_size 8

1. Dataset Issues

MME Dataset

Most datasets can be downloaded normally, but for the MME dataset, when I try to download from the repository path specified in the paper (link), I find that the image names in the downloaded files don't match the names listed in mme_val.jsonl.

Missing Label Files

The following label files are missing:

  • ./GRIT_data/ovd_relationship_val.jsonl
  • ./GRIT_data/ovd_negation_val.jsonl

Could you please provide these files or clarify how to obtain them?

2. Flash Attention Issues

During initial training, I encounter the following warning/error:

You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
...

The specific error indicates that float16 is not supported. I resolved this by manually specifying torch_dtype=torch.bfloat16 during model initialization. Did you encounter this issue during your training? What's the recommended approach to handle this?

if "qwen" in model_id.lower():
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(model, **model_init_kwargs)

3. Training Hyperparameters

I'd like to confirm a few things about the training parameters:

  1. Epochs: Is --num_train_epochs 500 an experimental parameter? This seems quite high - is this intentional?

  2. Batch Size & Memory: When training on 48GB VRAM, I can only set per_device_train_batch_size to 1, otherwise I get OOM errors. Is this normal? If the batch size can only be 1, should the learning rate be scaled accordingly? What would be the recommended values?

  3. Other Parameters: Are the other hyperparameters in the script reasonable for this model size and task?

4. Demo Environment

Regarding the gradio_qwen.py mentioned on the GitHub page, where can I find this file? It doesn't seem to be included in the current repository.


Environment:

  • Model: Qwen/Qwen2.5-VL-3B-Instruct
  • GPU: 8x GPUs with 48GB VRAM each
  • Framework: DeepSpeed ZeRO-2

5. Logs

Also, It's very weird that reward scores always get zero.

Image

Any guidance on these issues would be greatly appreciated. Thank you again for your work on this project!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions