-
Notifications
You must be signed in to change notification settings - Fork 771
feat: ec connector handler #5162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: ayushag <[email protected]>
WalkthroughIntroduces a new vLLM native encoder worker pathway with ECConnector mode. Adds configuration fields for encoder worker setup, utilities for ECTransferConfig creation and engine ID generation, a new initialization path in main orchestration, a handler class for encoder requests, and data models for encoder request/response serialization. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @components/src/dynamo/vllm/ec_transfer_utils.py:
- Around line 1-115: This file fails Black formatting; run the formatter (e.g.,
black --write components/src/dynamo/vllm/ec_transfer_utils.py or pre-commit run
--all-files) to reformat the file, then stage and commit the changes; check the
functions create_ec_transfer_config, get_encoder_engine_id, and get_pd_engine_id
to confirm formatting issues are resolved before pushing.
🧹 Nitpick comments (9)
components/src/dynamo/vllm/multimodal_utils/protocol.py (2)
153-169: Type inconsistency formodalityfield between request and response.
VLLMNativeEncoderRequest.modalityusesLiteral["image", "video", "audio"]whileVLLMNativeEncoderResponse.modalityusesstr. Consider using the same Literal type in the response for consistency and stronger type checking:class VLLMNativeEncoderResponse(BaseModel): """Response from vLLM-native encoder worker (ECConnector mode)""" request_id: str mm_hash: str # vLLM's multimodal hash identifier - modality: str # "image", "video", "audio" + modality: Literal["image", "video", "audio"] # "image", "video", "audio" embeddings_shape: Tuple[int, ...] # Shape of encoded embeddings connector_metadata: dict[str, Any] # ECConnector config info for PD workers
138-141: Consider addingaudio_urltoMultiModalInputfor consistency with the modality Literal.
VLLMNativeEncoderRequest.modalityincludes"audio"butMultiModalInputonly supportsimage_urlandvideo_url. While the handler documents this as a TODO, adding the field now would make the model forward-compatible:class MultiModalInput(BaseModel): image_url: Optional[str] = None video_url: Optional[str] = None + audio_url: Optional[str] = Nonecomponents/src/dynamo/vllm/args.py (1)
340-342: Consider orderingvllm_native_encoder_workerbeforemultimodal_encode_workerfor clarity.Both
vllm_native_encoder_workerandmultimodal_encode_workeruse the same component/endpoint ("encoder"/"generate"), but they represent different encoder pathways. The current ordering is fine logically since the mutual exclusivity check prevents both from being active, but grouping similar configurations together could improve readability.components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (3)
125-127: Uselogger.exceptioninstead oflogger.errorwhen re-raising exceptions.
logger.exceptionautomatically includes the stack trace, which is valuable for debugging. This addresses the static analysis hints TRY400.🔎 Proposed fix
except Exception as e: - logger.error(f"Failed to compute mm_hash: {e}") + logger.exception(f"Failed to compute mm_hash: {e}") raise # ... later ... except Exception as e: - logger.error(f"Encoder execution failed: {e}") + logger.exception(f"Encoder execution failed: {e}") raiseAlso applies to: 150-152
164-167: Consider aligning connector_metadata keys with Config field names.The keys
"ec_connector"and"storage_path"differ from the Config field namesec_connector_backendandec_storage_path. For consistency and to avoid confusion for PD workers consuming this metadata:🔎 Suggested change
connector_metadata={ - "ec_connector": self.config.ec_connector_backend, - "storage_path": self.config.ec_storage_path + "ec_connector_backend": self.config.ec_connector_backend, + "ec_storage_path": self.config.ec_storage_path }
73-75: Keepcontextparameter for interface compatibility.The
contextparameter is unused (static analysis hint ARG002), but this is intentional as it's part of the handler interface required by the endpoint serving framework. Adding an underscore prefix would suppress the warning while keeping the parameter:async def generate( - self, request, context + self, request, _context ) -> AsyncGenerator[str, None]:components/src/dynamo/vllm/main.py (2)
764-769: Prefix unused variables with underscore.
vllm_configanddefault_sampling_paramsare unpacked but never used. Prefix with underscore to indicate intentional non-use and suppress static analysis warnings:🔎 Proposed fix
( engine_client, - vllm_config, - default_sampling_params, + _vllm_config, + _default_sampling_params, prometheus_temp_dir, ) = setup_vllm_engine(config)
729-729: Uselogger.exceptionfor error logging before re-raising.Replace
logger.errorwithlogger.exceptionto include stack traces:🔎 Proposed fix
except Exception as e: - logger.error(f"Failed to serve encode worker endpoint: {e}") + logger.exception(f"Failed to serve encode worker endpoint: {e}") raise # ... in init_vllm_native_encoder ... except Exception as e: - logger.error(f"Failed to serve vLLM-native encoder endpoint: {e}") + logger.exception(f"Failed to serve vLLM-native encoder endpoint: {e}") raiseAlso applies to: 792-793
components/src/dynamo/vllm/ec_transfer_utils.py (1)
52-53: Use exception chaining withraise ... from e.This preserves the original exception context and improves debugging:
🔎 Proposed fix
except json.JSONDecodeError as e: - raise ValueError(f"Invalid JSON in --ec-extra-config: {e}") + raise ValueError(f"Invalid JSON in --ec-extra-config: {e}") from e
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/src/dynamo/vllm/args.pycomponents/src/dynamo/vllm/ec_transfer_utils.pycomponents/src/dynamo/vllm/main.pycomponents/src/dynamo/vllm/multimodal_handlers/__init__.pycomponents/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.pycomponents/src/dynamo/vllm/multimodal_utils/protocol.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
components/src/dynamo/vllm/multimodal_handlers/__init__.pycomponents/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.pycomponents/src/dynamo/vllm/main.py
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.
Applied to files:
components/src/dynamo/vllm/main.py
🧬 Code graph analysis (3)
components/src/dynamo/vllm/multimodal_handlers/__init__.py (1)
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (1)
VLLMNativeEncoderWorkerHandler(37-183)
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (1)
components/src/dynamo/vllm/multimodal_utils/protocol.py (2)
VLLMNativeEncoderRequest(153-159)VLLMNativeEncoderResponse(162-169)
components/src/dynamo/vllm/multimodal_utils/protocol.py (1)
examples/multimodal/utils/protocol.py (1)
MultiModalInput(147-150)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/5162/merge) by ayushag-nv.
components/src/dynamo/vllm/ec_transfer_utils.py
[error] 1-1: Black formatting changed this file. Run 'pre-commit run --all-files' or 'black --write' to fix code style issues in this file.
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py
[error] 1-1: Black formatting changed this file. Run 'pre-commit run --all-files' or 'black --write' to fix code style issues in this file.
🪛 Ruff (0.14.10)
components/src/dynamo/vllm/args.py
317-320: Avoid specifying long messages outside the exception class
(TRY003)
323-323: Avoid specifying long messages outside the exception class
(TRY003)
331-334: Avoid specifying long messages outside the exception class
(TRY003)
components/src/dynamo/vllm/ec_transfer_utils.py
53-53: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
53-53: Avoid specifying long messages outside the exception class
(TRY003)
61-64: Avoid specifying long messages outside the exception class
(TRY003)
components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py
74-74: Unused method argument: context
(ARG002)
113-115: Avoid specifying long messages outside the exception class
(TRY003)
126-126: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
151-151: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
182-182: Do not catch blind exception: Exception
(BLE001)
components/src/dynamo/vllm/main.py
729-729: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
766-766: Unpacked variable vllm_config is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
767-767: Unpacked variable default_sampling_params is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
793-793: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (13)
components/src/dynamo/vllm/multimodal_handlers/__init__.py (1)
6-8: LGTM!The import and export of
VLLMNativeEncoderWorkerHandlerfollows the existing pattern and correctly exposes the new handler through the package's public API.Also applies to: 19-19
components/src/dynamo/vllm/args.py (3)
70-76: LGTM!The new Config attributes for ECConnector mode are well-structured with sensible defaults.
202-229: LGTM!CLI arguments are well-documented with clear help text explaining the purpose of each option. The ECConnector configuration flags provide good flexibility for different storage backends.
325-334: LGTM!Validation logic correctly enforces that
--ec-storage-pathis required when using the defaultECExampleConnectorbackend, preventing runtime failures due to missing configuration.components/src/dynamo/vllm/multimodal_handlers/vllm_native_encoder_handler.py (2)
154-156: Hardcoded embeddings shape may cause issues with non-Llama 3.2 Vision models.The placeholder shape
(1, 576, 4096)is specific to Llama 3.2 Vision. PD workers receiving this metadata may rely on it for memory allocation or validation. Consider adding a warning log or making this more prominent:# TODO: Get actual embeddings shape from vLLM instead of hardcoded value # For now, using typical Llama 3.2 Vision shape as placeholder + logger.warning( + "Using hardcoded embeddings_shape placeholder. " + "This may be incorrect for non-Llama 3.2 Vision models." + ) embeddings_shape = (1, 576, 4096)
1-184: Fix Black formatting issue.The pipeline indicates this file needs Black formatting. Run
pre-commit run --all-filesorblack --writeto fix formatting.⛔ Skipped due to learnings
Learnt from: ayushag-nv Repo: ai-dynamo/dynamo PR: 3634 File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72 Timestamp: 2025-10-28T04:09:48.264Z Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.components/src/dynamo/vllm/main.py (5)
32-36: LGTM!Imports for ECTransferConfig utilities and the new handler are correctly added.
Also applies to: 42-42
94-96: LGTM!The routing logic correctly prioritizes
vllm_native_encoder_workerin the initialization flow.
735-796: LGTM oninit_vllm_native_encoderstructure.The function follows established patterns from other init functions. The ECConnector producer configuration, vLLM engine setup, and endpoint serving are well-organized with appropriate logging.
817-835: LGTM on ECConnector consumer mode integration.The consumer mode configuration in
init_multimodal_workeris well-integrated and follows the same pattern as the producer configuration ininit_vllm_native_encoder.
759-761: No action needed.ec_transfer_configis a valid and supported attribute on vLLM'sAsyncEngineArgs. The code correctly instantiatesECTransferConfig(imported fromvllm.config) and assigns it toengine_argsfollowing the same integration pattern as other vLLM configuration attributes likekv_transfer_configandkv_events_config.Likely an incorrect or invalid review comment.
components/src/dynamo/vllm/ec_transfer_utils.py (2)
23-76: LGTM oncreate_ec_transfer_configimplementation.The function properly:
- Parses optional JSON extra config
- Validates required fields for ECExampleConnector
- Provides clear logging of configuration
- Returns a properly constructed ECTransferConfig
79-114: LGTM on engine ID generation functions.The functions provide clear, predictable engine IDs with appropriate logging. The naming convention
{namespace}.{component}.{role}.{instance_id}is intuitive for debugging and identification.
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
e15ae2c to
9f6c506
Compare
…r-example' into ayushag/ec-connector-example
Signed-off-by: ayushag <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: Ayush Agarwal <[email protected]>
Signed-off-by: ayushag <[email protected]>
|
/ok to test 7723a39 |
Overview:
vLLM Supports Embedding Cache Connector that enables encoder disaggregation. The vLLM encoder encoded the image using mm_hash and use EC Connector to store in the Embedding Cache. In this way, encoder acts as a producer.
PD workers can act as consumer. Provided the same EC Connector config as encoder, they can read the embeddings from the cache and use for multi-modal inference.
In this PR, a vllm specific encoder handler is added which used EC Connector to perform the encoding of image payloads.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.