[Feature]: Performance issue, when using Qwen2.5-VL-32B-Instruct model for multi graph inference

### 🚀 The feature, motivation and pitch

1、Question
        When using the Qwen2.5-VL-32B-Instruct model with 9 1080P images as input, after input_preprocessor processing, the pixel_values tensor dimension in mm_kwargs within processed_inputs is [96876,1176]. This high dimensionality causes very long serialization and deserialization times between processes. In this example, both serialization and deserialization take about 8 seconds each (multi-GPU process communication). Are there any solutions to optimize this?
        Multi-image preprocessing is currently performed on the CPU. Is it possible to move the preprocessing to the GPU? The preprocessing takes approximately 10 seconds.

2、The following is a reproduction environment
        1）VLLM version: 0.8.3
        2）Start command:   python3 -m vllm.entrypoints.openai.api_server --model /models/Qwen2.5-VL-32B-Instruct/ --limit-mm-per-prompt image=12 --tensor-parallel-size 4 --max-model-len 64000

3、Requests:
  curl --location '127.0.0.1:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "/models/Qwen2.5-VL-32B-Instruct/",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "描述下列图片"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/1.jpg"
                    }
                }
                ,
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/2.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/3.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/4.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/5.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/6.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/7.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/8.jpg"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "http://127.0.0.1:8089/9.jpg"
                    }
                }
            ]
        }
    ]
}'




### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Performance issue, when using Qwen2.5-VL-32B-Instruct model for multi graph inference #17297

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Performance issue, when using Qwen2.5-VL-32B-Instruct model for multi graph inference #17297

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions