Skip to content

Conversation

@KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Nov 5, 2025

Overview

Enables vLLM backend workers to process base64 data URL images.

Backend workers now extract image URLs from PreprocessedRequest.multi_modal_data (added in #3733) and decode them using ImageLoader:

  • data:image/*;base64,<encoded> → Decoded to PIL.Image
  • http:// → Fetched and loaded as PIL.Image

Related PRS:

Details

Modified: components/src/dynamo/vllm/handlers.py

  • Added _extract_multimodal_data() method

Scripts:

  • Created agg_multimodal.sh - Standard deployment using Rust preprocessor
  • Renamed agg_multimodal.shagg_multimodal_epd.sh - EPD architecture (preserved)

Tests:

  • Renamed existing tests with _epd suffix (e.g., multimodal_agg_qwenmultimodal_agg_qwen_epd)
  • Added new multimodal_agg_qwen test using standard deployment
  • Validates both HTTP and base64 URLs passthrough.

Where should the reviewer start?

  1. handlers.py:116-168 - Extraction logic
  2. test_vllm.py:166-195 - Test validation

Summary by CodeRabbit

  • New Features

    • Added multimodal image support for vLLM, enabling both URL-based and base64-encoded image inputs
    • Introduced Encode-Prefill-Decode (EPD) deployment architecture for multimodal backends
    • Added support for Qwen2.5-VL-7B-Instruct model with optimized GPU memory configuration
  • Tests

    • Added comprehensive multimodal test cases with image URL and base64 encoding validation

Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
@KrishnanPrash KrishnanPrash requested review from a team as code owners November 5, 2025 00:54
@KrishnanPrash KrishnanPrash changed the title Add base64 and HTTP image URL support to vLLM workers feat: Add base64 and HTTP image URL support to vLLM workers Nov 5, 2025
@github-actions github-actions bot added the feat label Nov 5, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 5, 2025

Walkthrough

This pull request adds multimodal image data support to the vLLM handler system. It introduces image loading and data extraction logic in the core handlers, updates deployment scripts to reflect a simplified or alternative EPD architecture for multimodal inference, and extends test configurations with image URLs and base64-encoded test data.

Changes

Cohort / File(s) Summary
Core multimodal handler logic
components/src/dynamo/vllm/handlers.py
Added ImageLoader attribute and _extract_multimodal_data method to BaseWorkerHandler; updated DecodeWorkerHandler.generate and PrefillWorkerHandler.generate to inject extracted multimodal data into TokensPrompt; includes error handling and future-proofing for video_url.
Deployment script refactor
examples/backends/vllm/launch/agg_multimodal.sh
Simplified multimodal backend to single vLLM process with Dynamo frontend; replaced llava-hf/llava-1.5-7b-hf with Qwen/Qwen2.5-VL-7B-Instruct; removed prompt-template logic; added GPU memory optimization flags.
New EPD architecture script
examples/backends/vllm/launch/agg_multimodal_epd.sh
New Bash script implementing 3-component Encode-Prefill-Decode multimodal backend; includes CLI argument parsing, dynamic GPU memory optimization per model, and orchestration of preprocessor and worker processes.
Test configuration updates
tests/serve/test_vllm.py
Added module constants BUS_IMAGE_URL and BUS_IMAGE_B64; introduced stragglers field to VLLMConfig; renamed and reconfigured multimodal_agg_llava to multimodal_agg_llava_epd; extended multimodal_agg_qwen with base64 data URL test payload; added new multimodal_agg_qwen_epd config entry.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • components/src/dynamo/vllm/handlers.py: Review image loading initialization, _extract_multimodal_data logic, and correct injection points in both DecodeWorkerHandler.generate and PrefillWorkerHandler.generate; verify error handling and type compatibility with vLLM's multimodal data format.
  • examples/backends/vllm/launch/agg_multimodal_epd.sh: Validate GPU binding, worker orchestration, and GPU memory argument construction for model-specific optimization.
  • tests/serve/test_vllm.py: Confirm image URL and base64 payload formats; verify test expectations align with new configuration structure.

Poem

🐰 Behold, images now hop through the pipeline flow,
From URLs and base64, the handlers make them glow,
Multimodal dreams extracted, injected with care,
With EPD workers dancing through the data air! 📸

Pre-merge checks

✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The description covers all required template sections with sufficient detail: Overview explains the feature, Details specify modified files and scripts, and Where to start provides exact line numbers for reviewer guidance.
Title check ✅ Passed The title 'Add base64 and HTTP image URL support to vLLM workers' directly and accurately summarizes the main change: enabling vLLM workers to process base64 data URL images and HTTP image URLs through the new _extract_multimodal_data() method.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 83a3fe4 and ea4a792.

⛔ Files ignored due to path filters (1)
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • components/src/dynamo/vllm/handlers.py (5 hunks)
  • examples/backends/vllm/launch/agg_multimodal.sh (3 hunks)
  • examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
  • tests/serve/test_vllm.py (5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

  • components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/serve/test_vllm.py (1)
tests/utils/payload_builder.py (1)
  • chat_payload (81-108)
🪛 Ruff (0.14.3)
components/src/dynamo/vllm/handlers.py

149-149: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/serve/test_vllm.py

31-31: Probable use of requests call without timeout

(S113)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo

Copy link
Contributor

@milesial milesial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with E2E PR!
Left a few questions and minor comments.
LGTM!

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Signed-off-by: Krishnan Prashanth <[email protected]>
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove test_multimodal.sh from the root of the repo. It shouldn't be at top level. It should either be removed or put in some test utils type of folder.

@KrishnanPrash
Copy link
Contributor Author

Please remove test_multimodal.sh from the root of the repo. It shouldn't be at top level. It should either be removed or put in some test utils type of folder.

Added for ease of cluster testing. Will clean up before merging.

Signed-off-by: Krishnan Prashanth <[email protected]>
@KrishnanPrash KrishnanPrash merged commit 25fc732 into main Nov 6, 2025
66 of 84 checks passed
@KrishnanPrash KrishnanPrash deleted the kprashanth/vllm-b64-img branch November 6, 2025 00:49
pull bot pushed a commit to saidrhs/dynamo that referenced this pull request Nov 6, 2025
@rmccorm4 rmccorm4 added multimodal backend::vllm Relates to the vllm backend labels Nov 6, 2025
fi

# Start processor (Python-based preprocessing, handles prompt templating)
python -m dynamo.vllm --multimodal-processor --model $MODEL_NAME --mm-prompt-template "$PROMPT_TEMPLATE" &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for next week - if the worker/backend supports PreprocessedRequest.multi_modal_data now, do we need this multimodal-preprocessor that explicitly registers as expecting ModelInput.Text so it can do the processing itself?

ref:

ModelInput.Text, # Custom processor is used and this type bypasses SDK processor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants