Skip to content

Conversation

dsocek
Copy link

@dsocek dsocek commented Sep 17, 2025

What does this PR do?

Implements HPU HW media pipeline for multi-modal inference in vLLM for InternVL model.

This feature can triggered with:

VLLM_USE_MEDIA_PIPELINE=1

and also needs

DECODER_MAX_RESIZE=5376

5376 (max image output resolution) = 448 (patch size) * 12 (max patches)

@dsocek dsocek force-pushed the add_mediapipe_for_internvl branch from 89c4524 to 79bd609 Compare September 18, 2025 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant