GRPO for Qwen3-VL

Hi authors,

I am using GRPO to fine-tune Qwen/Qwen3-VL-2B-Instruct and encoutering the following error: 

```
[rank0]:   File "src/trainer/grpo_trainer.py", line 53, in _generate_and_score_completions
[rank0]:     prompts = [x["prompt"] for x in inputs]
[rank0]:   File "src/trainer/grpo_trainer.py", line 53, in <listcomp>
[rank0]:     prompts = [x["prompt"] for x in inputs]
[rank0]:                ~^^^^^^^^^^
[rank0]: TypeError: string indices must be integers, not 'str'
```

It looks like the autoprocessor of Qwen3-VL is turning the inputs into a BatchFeature object with the following keys:
`b.data.keys()
Out[8]: dict_keys(['input_ids', 'attention_mask', 'pixel_values', 'image_grid_thw'])`

Are there any quick fixes I can do? Thanks for the help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO for Qwen3-VL #232

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GRPO for Qwen3-VL #232

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions