Skip to content

Conversation

@ayushag-nv
Copy link
Contributor

@ayushag-nv ayushag-nv commented Jan 5, 2026

Overview:

vLLM Supports Embedding Cache Connector that enables encoder disaggregation. The vLLM encoder encoded the image using mm_hash and use EC Connector to store in the Embedding Cache. In this way, encoder acts as a producer.

PD workers can act as consumer. Provided the same EC Connector config as encoder, they can read the embeddings from the cache and use for multi-modal inference.

In this PR, a vllm specific encoder handler is added which used EC Connector to perform the encoding of image payloads.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features
    • Added support for vLLM-native encoder worker with ECConnector mode integration.
    • Introduced new configuration options for encoder worker setup including backend selection, storage path, and consumer mode settings.
    • Added encoder endpoint capability for multimodal processing with validation and configuration assembly.

✏️ Tip: You can customize this high-level summary in your review settings.

@ayushag-nv ayushag-nv requested review from a team as code owners January 5, 2026 06:16
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ayushag-nv ayushag-nv marked this pull request as draft January 5, 2026 06:16
@github-actions github-actions bot added the feat label Jan 5, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

Walkthrough

Introduces a new vLLM native encoder worker pathway with ECConnector mode. Adds configuration fields for encoder worker setup, utilities for ECTransferConfig creation and engine ID generation, a new initialization path in main orchestration, a handler class for encoder requests, and data models for encoder request/response serialization.

Changes

Cohort / File(s) Summary
Configuration & Utilities
components/src/dynamo/vllm/args.py, components/src/dynamo/vllm/ec_transfer_utils.py
Added five new Config fields (vllm_native_encoder_worker, ec_connector_backend, ec_storage_path, ec_extra_config, ec_consumer_mode) to args.py. Created ec_transfer_utils.py with functions to generate engine IDs (encoder, pd) and construct ECTransferConfig with JSON parsing and validation logic.
Core Orchestration
components/src/dynamo/vllm/main.py
Added async init_vllm_native_encoder method to set up encoder endpoint in ECConnector producer mode, routed startup based on vllm_native_encoder_worker flag, and integrated ECConnector consumer configuration into init_multimodal_worker when ec_consumer_mode is enabled. Imported new utilities and handler.
Handler & Protocol
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py, components/src/dynamo/vllm/multimodal_utils/protocol.py, components/src/dynamo/vllm/multimodal_handlers/__init__.py
Introduced VLLMNativeEncoderWorkerHandler class with image loading, mm_hash computation, and encoder execution; added VLLMNativeEncoderRequest/VLLMNativeEncoderResponse data models; exported handler in module init.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A new encoder hops in with connector flair,
Images loaded, hashes computed with care,
Config fields sprouting, utilities align,
ECConnector producer-consumer design shines! 🌟

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is partially complete with an Overview section but Details and Where should the reviewer start sections are empty placeholders. Complete the Details section with specific changes made, and fill in the Where should the reviewer start section with file recommendations. Update the Related Issues section with an actual issue number or remove the placeholder.
Title check ❓ Inconclusive The title is vague and does not clearly communicate the main purpose of the changes. While it references 'ec connector handler', it lacks specificity about what functionality is being added. Expand the title to clearly describe the main feature, such as 'Add vLLM native encoder worker with ECConnector support' to better convey the primary changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @components/src/dynamo/vllm/ec_transfer_utils.py:
- Around line 1-115: This file fails Black formatting; run the formatter (e.g.,
black --write components/src/dynamo/vllm/ec_transfer_utils.py or pre-commit run
--all-files) to reformat the file, then stage and commit the changes; check the
functions create_ec_transfer_config, get_encoder_engine_id, and get_pd_engine_id
to confirm formatting issues are resolved before pushing.
🧹 Nitpick comments (9)
components/src/dynamo/vllm/multimodal_utils/protocol.py (2)

153-169: Type inconsistency for modality field between request and response.

VLLMNativeEncoderRequest.modality uses Literal["image", "video", "audio"] while VLLMNativeEncoderResponse.modality uses str. Consider using the same Literal type in the response for consistency and stronger type checking:

 class VLLMNativeEncoderResponse(BaseModel):
     """Response from vLLM-native encoder worker (ECConnector mode)"""

     request_id: str
     mm_hash: str  # vLLM's multimodal hash identifier
-    modality: str  # "image", "video", "audio"
+    modality: Literal["image", "video", "audio"]  # "image", "video", "audio"
     embeddings_shape: Tuple[int, ...]  # Shape of encoded embeddings
     connector_metadata: dict[str, Any]  # ECConnector config info for PD workers

138-141: Consider adding audio_url to MultiModalInput for consistency with the modality Literal.

VLLMNativeEncoderRequest.modality includes "audio" but MultiModalInput only supports image_url and video_url. While the handler documents this as a TODO, adding the field now would make the model forward-compatible:

 class MultiModalInput(BaseModel):
     image_url: Optional[str] = None
     video_url: Optional[str] = None
+    audio_url: Optional[str] = None
components/src/dynamo/vllm/args.py (1)

340-342: Consider ordering vllm_native_encoder_worker before multimodal_encode_worker for clarity.

Both vllm_native_encoder_worker and multimodal_encode_worker use the same component/endpoint ("encoder"/"generate"), but they represent different encoder pathways. The current ordering is fine logically since the mutual exclusivity check prevents both from being active, but grouping similar configurations together could improve readability.

components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (3)

125-127: Use logger.exception instead of logger.error when re-raising exceptions.

logger.exception automatically includes the stack trace, which is valuable for debugging. This addresses the static analysis hints TRY400.

🔎 Proposed fix
         except Exception as e:
-            logger.error(f"Failed to compute mm_hash: {e}")
+            logger.exception(f"Failed to compute mm_hash: {e}")
             raise

         # ... later ...

         except Exception as e:
-            logger.error(f"Encoder execution failed: {e}")
+            logger.exception(f"Encoder execution failed: {e}")
             raise

Also applies to: 150-152


164-167: Consider aligning connector_metadata keys with Config field names.

The keys "ec_connector" and "storage_path" differ from the Config field names ec_connector_backend and ec_storage_path. For consistency and to avoid confusion for PD workers consuming this metadata:

🔎 Suggested change
         connector_metadata={
-            "ec_connector": self.config.ec_connector_backend,
-            "storage_path": self.config.ec_storage_path
+            "ec_connector_backend": self.config.ec_connector_backend,
+            "ec_storage_path": self.config.ec_storage_path
         }

73-75: Keep context parameter for interface compatibility.

The context parameter is unused (static analysis hint ARG002), but this is intentional as it's part of the handler interface required by the endpoint serving framework. Adding an underscore prefix would suppress the warning while keeping the parameter:

     async def generate(
-        self, request, context
+        self, request, _context
     ) -> AsyncGenerator[str, None]:
components/src/dynamo/vllm/main.py (2)

764-769: Prefix unused variables with underscore.

vllm_config and default_sampling_params are unpacked but never used. Prefix with underscore to indicate intentional non-use and suppress static analysis warnings:

🔎 Proposed fix
     (
         engine_client,
-        vllm_config,
-        default_sampling_params,
+        _vllm_config,
+        _default_sampling_params,
         prometheus_temp_dir,
     ) = setup_vllm_engine(config)

729-729: Use logger.exception for error logging before re-raising.

Replace logger.error with logger.exception to include stack traces:

🔎 Proposed fix
     except Exception as e:
-        logger.error(f"Failed to serve encode worker endpoint: {e}")
+        logger.exception(f"Failed to serve encode worker endpoint: {e}")
         raise

     # ... in init_vllm_native_encoder ...

     except Exception as e:
-        logger.error(f"Failed to serve vLLM-native encoder endpoint: {e}")
+        logger.exception(f"Failed to serve vLLM-native encoder endpoint: {e}")
         raise

Also applies to: 792-793

components/src/dynamo/vllm/ec_transfer_utils.py (1)

52-53: Use exception chaining with raise ... from e.

This preserves the original exception context and improves debugging:

🔎 Proposed fix
         except json.JSONDecodeError as e:
-            raise ValueError(f"Invalid JSON in --ec-extra-config: {e}")
+            raise ValueError(f"Invalid JSON in --ec-extra-config: {e}") from e
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 007c5b6 and 488136d.

📒 Files selected for processing (6)
  • components/src/dynamo/vllm/args.py
  • components/src/dynamo/vllm/ec_transfer_utils.py
  • components/src/dynamo/vllm/main.py
  • components/src/dynamo/vllm/multimodal_handlers/__init__.py
  • components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py
  • components/src/dynamo/vllm/multimodal_utils/protocol.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

  • components/src/dynamo/vllm/multimodal_handlers/__init__.py
  • components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py
  • components/src/dynamo/vllm/main.py
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.

Applied to files:

  • components/src/dynamo/vllm/main.py
🧬 Code graph analysis (3)
components/src/dynamo/vllm/multimodal_handlers/__init__.py (1)
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (1)
  • VLLMNativeEncoderWorkerHandler (37-183)
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (1)
components/src/dynamo/vllm/multimodal_utils/protocol.py (2)
  • VLLMNativeEncoderRequest (153-159)
  • VLLMNativeEncoderResponse (162-169)
components/src/dynamo/vllm/multimodal_utils/protocol.py (1)
examples/multimodal/utils/protocol.py (1)
  • MultiModalInput (147-150)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/5162/merge) by ayushag-nv.
components/src/dynamo/vllm/ec_transfer_utils.py

[error] 1-1: Black formatting changed this file. Run 'pre-commit run --all-files' or 'black --write' to fix code style issues in this file.

components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py

[error] 1-1: Black formatting changed this file. Run 'pre-commit run --all-files' or 'black --write' to fix code style issues in this file.

🪛 Ruff (0.14.10)
components/src/dynamo/vllm/args.py

317-320: Avoid specifying long messages outside the exception class

(TRY003)


323-323: Avoid specifying long messages outside the exception class

(TRY003)


331-334: Avoid specifying long messages outside the exception class

(TRY003)

components/src/dynamo/vllm/ec_transfer_utils.py

53-53: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


53-53: Avoid specifying long messages outside the exception class

(TRY003)


61-64: Avoid specifying long messages outside the exception class

(TRY003)

components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py

74-74: Unused method argument: context

(ARG002)


113-115: Avoid specifying long messages outside the exception class

(TRY003)


126-126: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


151-151: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


182-182: Do not catch blind exception: Exception

(BLE001)

components/src/dynamo/vllm/main.py

729-729: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


766-766: Unpacked variable vllm_config is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


767-767: Unpacked variable default_sampling_params is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


793-793: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/src/dynamo/vllm/multimodal_handlers/__init__.py (1)

6-8: LGTM!

The import and export of VLLMNativeEncoderWorkerHandler follows the existing pattern and correctly exposes the new handler through the package's public API.

Also applies to: 19-19

components/src/dynamo/vllm/args.py (3)

70-76: LGTM!

The new Config attributes for ECConnector mode are well-structured with sensible defaults.


202-229: LGTM!

CLI arguments are well-documented with clear help text explaining the purpose of each option. The ECConnector configuration flags provide good flexibility for different storage backends.


325-334: LGTM!

Validation logic correctly enforces that --ec-storage-path is required when using the default ECExampleConnector backend, preventing runtime failures due to missing configuration.

components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (2)

154-156: Hardcoded embeddings shape may cause issues with non-Llama 3.2 Vision models.

The placeholder shape (1, 576, 4096) is specific to Llama 3.2 Vision. PD workers receiving this metadata may rely on it for memory allocation or validation. Consider adding a warning log or making this more prominent:

         # TODO: Get actual embeddings shape from vLLM instead of hardcoded value
         # For now, using typical Llama 3.2 Vision shape as placeholder
+        logger.warning(
+            "Using hardcoded embeddings_shape placeholder. "
+            "This may be incorrect for non-Llama 3.2 Vision models."
+        )
         embeddings_shape = (1, 576, 4096)

1-184: Fix Black formatting issue.

The pipeline indicates this file needs Black formatting. Run pre-commit run --all-files or black --write to fix formatting.

⛔ Skipped due to learnings
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
components/src/dynamo/vllm/main.py (5)

32-36: LGTM!

Imports for ECTransferConfig utilities and the new handler are correctly added.

Also applies to: 42-42


94-96: LGTM!

The routing logic correctly prioritizes vllm_native_encoder_worker in the initialization flow.


735-796: LGTM on init_vllm_native_encoder structure.

The function follows established patterns from other init functions. The ECConnector producer configuration, vLLM engine setup, and endpoint serving are well-organized with appropriate logging.


817-835: LGTM on ECConnector consumer mode integration.

The consumer mode configuration in init_multimodal_worker is well-integrated and follows the same pattern as the producer configuration in init_vllm_native_encoder.


759-761: No action needed. ec_transfer_config is a valid and supported attribute on vLLM's AsyncEngineArgs. The code correctly instantiates ECTransferConfig (imported from vllm.config) and assigns it to engine_args following the same integration pattern as other vLLM configuration attributes like kv_transfer_config and kv_events_config.

Likely an incorrect or invalid review comment.

components/src/dynamo/vllm/ec_transfer_utils.py (2)

23-76: LGTM on create_ec_transfer_config implementation.

The function properly:

  • Parses optional JSON extra config
  • Validates required fields for ECExampleConnector
  • Provides clear logging of configuration
  • Returns a properly constructed ECTransferConfig

79-114: LGTM on engine ID generation functions.

The functions provide clear, predictable engine IDs with appropriate logging. The naming convention {namespace}.{component}.{role}.{instance_id} is intuitive for debugging and identification.

Signed-off-by: ayushag <[email protected]>
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Jan 5, 2026
@ayushag-nv ayushag-nv marked this pull request as ready for review January 6, 2026 05:08
@ayushag-nv ayushag-nv requested a review from a team as a code owner January 6, 2026 05:08
@ayushag-nv ayushag-nv force-pushed the ayushag/ec-connector-example branch from e15ae2c to 9f6c506 Compare January 6, 2026 05:08
@ayushag-nv ayushag-nv requested a review from GuanLuo January 6, 2026 05:15
@ayushag-nv
Copy link
Contributor Author

/ok to test 7723a39

@ayushag-nv ayushag-nv enabled auto-merge (squash) January 7, 2026 18:11
@ayushag-nv ayushag-nv merged commit 85e0512 into main Jan 7, 2026
28 checks passed
@ayushag-nv ayushag-nv deleted the ayushag/ec-connector-example branch January 7, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants