Add multi-image input to GRPO trainer #3805

Closed

Andcircle started this conversation in General

Andcircle
Jul 30, 2025

The current solution seems only support single image input, also
the image processing logic here https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L1363
is not that flexible. Maybe should enable customized data_collator

In order to support multi-image input, I think VLLMCLient also need to be updated:
https://github.com/huggingface/trl/blob/main/trl/trainer/grpo_trainer.py#L1441
Is it possible just use vllm.LLM, each prompt is a conversation (including images)

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment