-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Description
🚀 The feature, motivation and pitch
1、Question
When using the Qwen2.5-VL-32B-Instruct model with 9 1080P images as input, after input_preprocessor processing, the pixel_values tensor dimension in mm_kwargs within processed_inputs is [96876,1176]. This high dimensionality causes very long serialization and deserialization times between processes. In this example, both serialization and deserialization take about 8 seconds each (multi-GPU process communication). Are there any solutions to optimize this?
Multi-image preprocessing is currently performed on the CPU. Is it possible to move the preprocessing to the GPU? The preprocessing takes approximately 10 seconds.
2、The following is a reproduction environment
1)VLLM version: 0.8.3
2)Start command: python3 -m vllm.entrypoints.openai.api_server --model /models/Qwen2.5-VL-32B-Instruct/ --limit-mm-per-prompt image=12 --tensor-parallel-size 4 --max-model-len 64000
3、Requests:
curl --location '127.0.0.1:8000/v1/chat/completions'
--header 'Content-Type: application/json'
--data '{
"model": "/models/Qwen2.5-VL-32B-Instruct/",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "描述下列图片"
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/1.jpg"
}
}
,
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/2.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/3.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/4.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/5.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/6.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/7.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/8.jpg"
}
},
{
"type": "image_url",
"image_url": {
"url": "http://127.0.0.1:8089/9.jpg"
}
}
]
}
]
}'
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.