Skip to content

Conversation

@ixlmar
Copy link
Collaborator

@ixlmar ixlmar commented Dec 4, 2025

Description

This PR implements functionality analogous to vLLM's "image_embeds".

Test Coverage

Unit test added.

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added image embeddings support as an input modality, enabling users to supply base64-encoded tensor embeddings in chat requests.
  • Documentation

    • Added documentation for image embeddings modality coverage with JSON examples and guidance on tensor embedding generation.
  • Tests

    • Added test marker for L40s-specific tests.
    • Added new test coverage for image embeddings in chat completions.

✏️ Tip: You can customize this high-level summary in your review settings.

@ixlmar ixlmar force-pushed the feat/image-embeds branch from dde2050 to 1352371 Compare December 4, 2025 16:40
@ixlmar ixlmar requested review from chang-l and evezhier December 4, 2025 16:40
@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 4, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26996 [ run ] triggered by Bot. Commit: 1352371

@tensorrt-cicd
Copy link
Collaborator

PR_Github #26996 [ run ] completed with state SUCCESS. Commit: 1352371
/LLM/main/L0_MergeRequest_PR pipeline #20582 completed with status: 'FAILURE'

@ixlmar ixlmar force-pushed the feat/image-embeds branch from 650abcb to 703df33 Compare December 5, 2025 12:54
@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 5, 2025

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27124 [ run ] triggered by Bot. Commit: 703df33

@ixlmar ixlmar force-pushed the feat/image-embeds branch from 079ae0d to ef89a3b Compare December 5, 2025 14:06
@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 5, 2025

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27130 [ run ] triggered by Bot. Commit: ef89a3b

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27124 [ run ] completed with state ABORTED. Commit: 703df33
/LLM/main/L0_MergeRequest_PR pipeline #20694 (Partly Tested) completed with status: 'FAILURE'

@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 5, 2025

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27134 [ run ] triggered by Bot. Commit: 8a35e18

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27130 [ run ] completed with state ABORTED. Commit: ef89a3b
LLM/main/L0_MergeRequest_PR #20700 (Blue Ocean) completed with status: ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27134 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20703 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 5, 2025

/bot run --disable-fail-fast

@ixlmar ixlmar marked this pull request as ready for review December 5, 2025 17:04
@ixlmar ixlmar requested a review from a team as a code owner December 5, 2025 17:04
@ixlmar ixlmar requested review from chzblych and kaiyux December 5, 2025 17:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

📝 Walkthrough

Walkthrough

Adds image embeddings support to the TensorRT LLM serving stack by introducing new types for embedding content, parsers for extracting base64-encoded embeddings, multimodal data tracker enhancements to track modality separately from media type, and corresponding chat API and server-side handling to process and propagate embeddings through the message pipeline.

Changes

Cohort / File(s) Summary
Test Framework Setup
cpp/kernels/fmha_v2/pytest.ini
Adds needs_l40s pytest marker for categorizing tests requiring L40s hardware.
Input/Utilities API
tensorrt_llm/inputs/__init__.py, tensorrt_llm/inputs/utils.py
Exports new load_base64_image_embeds function to decode base64 image embeddings. Updates MultimodalDataTracker.add_data to accept optional keyword-only modality parameter, separating modality tracking from media type.
Chat & Serve Layer
tensorrt_llm/serve/chat_utils.py
Introduces ChatCompletionContentPartImageEmbedsParam type and ImageEmbedsData for embedding content. Adds _ImageEmbedsParser and extends MM_PARSER_MAP to parse image embeddings. Adds MM_EMBEDDING_MAP to map embedding types to modalities. Updates message parsing signatures and adds base64 embedding decoding via async coroutines.
OpenAI Server Integration
tensorrt_llm/serve/openai_server.py
Imports MM_EMBEDDING_MAP and integrates embedding extraction in chat response flow, separating embeddings from multimodal data and validating they don't coexist.
Documentation
docs/source/commands/trtllm-serve/trtllm-serve.rst
Adds documentation section for image embeddings modality support with JSON example and base64 encoding guidance.
Integration Test Configuration
tests/integration/defs/test_e2e.py, tests/integration/test_lists/test-db/l0_l40s.yml
Updates test execution to exclude needs_l40s marked tests. Adds new test_single_chat_session_image_embeds to L40s test list.
Multimodal Test Utilities
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
Introduces test-time patch module implementing _attach_multimodal_embeddings to serialize embeddings for testing and attach to Qwen2VLInputProcessorBase.
Multimodal Chat Tests
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py, tests/unittest/llmapi/apps/_test_openai_mmencoder.py
Adds new test fixtures for multimodal encoder server integration and test_single_chat_session_image_embeds to verify image embeddings workflow. Updates test_multimodal_content_mm_encoder signature with explicit return type and return statement.

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ChatUtils
    participant InputUtils
    participant Model

    Client->>OpenAIServer: POST /chat/completions with image_embeds
    OpenAIServer->>ChatUtils: parse_chat_messages_coroutines(messages)
    
    ChatUtils->>ChatUtils: Detect "image_embeds" content type
    ChatUtils->>ChatUtils: _ImageEmbedsParser extracts image_embeds.data
    ChatUtils->>InputUtils: load_base64_image_embeds(base64_str)
    InputUtils->>InputUtils: Decode base64 → torch.Tensor
    InputUtils-->>ChatUtils: Tensor coroutine
    
    ChatUtils->>ChatUtils: MM_EMBEDDING_MAP: "image_embeds" → "image"
    ChatUtils->>ChatUtils: MultimodalDataTracker.add_data(media_type, coroutine, modality="image")
    ChatUtils-->>OpenAIServer: MultimodalData with modality="image"
    
    OpenAIServer->>OpenAIServer: Extract embeddings via MM_EMBEDDING_MAP
    OpenAIServer->>OpenAIServer: Separate multi_modal_data vs multi_modal_embeddings
    OpenAIServer->>Model: Pass embeddings to model inference
    
    Model-->>OpenAIServer: Response
    OpenAIServer-->>Client: Chat completion response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring attention during review:

  • tensorrt_llm/serve/chat_utils.py: Extensive type additions and parsing logic; verify new parser correctly handles base64 decoding and modality mapping aligns across all code paths
  • MultimodalDataTracker.add_data signature change: Ensure all callers properly pass modality where needed and existing code without modality still behaves correctly
  • MM_EMBEDDING_MAP and MM_PARSER_MAP consistency: Verify mappings are aligned and no embedding types are handled inconsistently
  • tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py: New test integration with mm_encoder_server; verify fixture setup and error handling (BadRequestError fallback path)
  • Backward compatibility: Check that changes to function signatures in chat_utils.py don't break existing callers

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main feature being implemented: support for image_embeds in the OpenAI API, directly matching the core changes.
Description check ✅ Passed The description explains the implementation rationale (vLLM feature parity), mentions test coverage (unit test added), and includes a comprehensive checklist with all items reviewed and checked.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)

29-44: Unconventional test pattern - verify this is the intended design.

This helper monkey-patches Qwen2VLInputProcessorBase to serialize embeddings to a pickle file, then raises a ValueError with the file path. While functional, this approach has some concerns:

  1. Using exceptions for control flow: Raising ValueError to communicate the file path is unconventional. Consider whether the test could be structured differently to return the path normally.

  2. Temporary file cleanup: The pickle file in the system temp directory may not be cleaned up automatically. Consider using tempfile.NamedTemporaryFile with explicit cleanup.

  3. Test isolation: Monkey-patching a global class can affect other tests. Ensure this import only occurs in isolated test contexts.

If this pattern is intentional and aligns with your testing strategy, consider adding a comment explaining why exception-based communication is used here.

Note: Static analysis warnings about unused arguments and setattr are expected for this monkey-patching pattern and can be safely ignored.

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)

40-57: Temporary directory not fully cleaned up.

tempfile.mkdtemp() creates a directory that should be removed after the test. Currently, only the file is removed but the parent directory created at line 41 remains.

Consider using tempfile.TemporaryDirectory as a context manager or cleaning up the directory:

 @pytest.fixture(scope="module")
 def temp_extra_llm_api_options_file(request):
-    temp_dir = tempfile.mkdtemp()
+    temp_dir = tempfile.TemporaryDirectory()
-    temp_file_path = os.path.join(temp_dir, "extra_llm_api_options.yaml")
+    temp_file_path = os.path.join(temp_dir.name, "extra_llm_api_options.yaml")
     try:
         extra_llm_api_options_dict = {
             "kv_cache_config": {
                 "enable_block_reuse": False,
                 "free_gpu_memory_fraction": 0.6,
             },
         }

         with open(temp_file_path, 'w') as f:
             yaml.dump(extra_llm_api_options_dict, f)

         yield temp_file_path
     finally:
-        if os.path.exists(temp_file_path):
-            os.remove(temp_file_path)
+        temp_dir.cleanup()

26-31: Consider using __all__ or a comment instead of assertion for fixture visibility.

The assertion at line 31 works but is unconventional. A clearer approach might be a comment explaining the re-export or using __all__.

 from ._test_openai_mmencoder import RemoteMMEncoderServer
 from ._test_openai_mmencoder import server as mm_encoder_server
 from ._test_openai_mmencoder import \
     test_multimodal_content_mm_encoder as _test_multimodal_content_mm_encoder

-assert mm_encoder_server is not None  # keep 'mm_encoder_server' fixture visible in this module
+# Re-exported fixtures for pytest discovery
+__all__ = ["mm_encoder_server"]
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68253d9 and 8a35e18.

📒 Files selected for processing (11)
  • cpp/kernels/fmha_v2/pytest.ini (1 hunks)
  • docs/source/commands/trtllm-serve/trtllm-serve.rst (1 hunks)
  • tensorrt_llm/inputs/__init__.py (2 hunks)
  • tensorrt_llm/inputs/utils.py (2 hunks)
  • tensorrt_llm/serve/chat_utils.py (7 hunks)
  • tensorrt_llm/serve/openai_server.py (2 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_l40s.yml (1 hunks)
  • tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1 hunks)
  • tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (4 hunks)
  • tests/unittest/llmapi/apps/_test_openai_mmencoder.py (3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

  • tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
  • tensorrt_llm/inputs/__init__.py
  • tensorrt_llm/inputs/utils.py
  • tests/unittest/llmapi/apps/_test_openai_mmencoder.py
  • tests/integration/defs/test_e2e.py
  • tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
  • tensorrt_llm/serve/chat_utils.py
  • tensorrt_llm/serve/openai_server.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

  • tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
  • tensorrt_llm/inputs/__init__.py
  • tensorrt_llm/inputs/utils.py
  • tests/unittest/llmapi/apps/_test_openai_mmencoder.py
  • tests/integration/defs/test_e2e.py
  • tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
  • tensorrt_llm/serve/chat_utils.py
  • tensorrt_llm/serve/openai_server.py
🧠 Learnings (4)
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

  • tests/integration/test_lists/test-db/l0_l40s.yml
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

  • tests/integration/test_lists/test-db/l0_l40s.yml
  • tests/integration/defs/test_e2e.py
  • tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.

Applied to files:

  • tests/integration/defs/test_e2e.py
  • cpp/kernels/fmha_v2/pytest.ini
📚 Learning: 2025-09-09T13:33:32.247Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7497
File: tests/integration/defs/test_list_parser.py:0-0
Timestamp: 2025-09-09T13:33:32.247Z
Learning: The ISOLATE marker in test_list_parser.py is used for test list parsing and Jenkins pipeline logic, not for creating pytest markers. It allows the parser to recognize ISOLATE markers in test list files, which the Jenkins pipeline then uses to separate tests that need isolation, but no pytest.mark.isolate marker is created.

Applied to files:

  • cpp/kernels/fmha_v2/pytest.ini
🧬 Code graph analysis (8)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
  • Qwen2VLInputProcessorBase (91-403)
tensorrt_llm/inputs/__init__.py (1)
tensorrt_llm/inputs/utils.py (4)
  • get_cache_salt_id (787-795)
  • load_base64_image_embeds (116-122)
  • load_image (125-146)
  • load_video (266-287)
tensorrt_llm/inputs/utils.py (1)
tensorrt_llm/_torch/auto_deploy/utils/pattern_matcher.py (1)
  • placeholder (229-247)
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (2)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)
  • client (76-77)
  • model_name (35-36)
tensorrt_llm/llmapi/llm_args.py (1)
  • model_name (1687-1688)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (2)
  • llm_venv (713-730)
  • test_root (2508-2509)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (1)
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (7)
  • RemoteMMEncoderServer (16-44)
  • server (79-88)
  • test_multimodal_content_mm_encoder (101-140)
  • extra_encoder_options (55-56)
  • temp_extra_encoder_options_file (60-75)
  • model_name (48-49)
  • client (92-93)
tensorrt_llm/serve/chat_utils.py (1)
tensorrt_llm/inputs/utils.py (5)
  • async_load_image (149-173)
  • async_load_video (290-319)
  • load_base64_image_embeds (116-122)
  • MultimodalData (430-433)
  • add_data (477-488)
tensorrt_llm/serve/openai_server.py (1)
tensorrt_llm/serve/chat_utils.py (1)
  • load_chat_template (310-341)
🪛 Ruff (0.14.7)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py

31-31: Unused function argument: self

(ARG001)


32-32: Unused function argument: inputs

(ARG001)


34-34: Unused function argument: sampling_params

(ARG001)


44-44: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py

221-221: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

tensorrt_llm/serve/chat_utils.py

152-152: Do not catch blind exception: Exception

(BLE001)


153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tensorrt_llm/serve/openai_server.py

540-540: Abstract raise to an inner function

(TRY301)


540-540: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (15)
docs/source/commands/trtllm-serve/trtllm-serve.rst (1)

155-172: LGTM! Clear documentation for the new image embeddings feature.

The documentation clearly explains how to provide image embeddings directly via image_embeds, includes a helpful JSON example, and provides guidance on generating the base64-encoded content using torch.save.

cpp/kernels/fmha_v2/pytest.ini (1)

9-9: LGTM! Marker addition is correct.

The needs_l40s marker is properly declared and will be used to categorize tests that require L40s hardware for execution.

Based on learnings, pytest markers are correctly honored in TensorRT-LLM's CI infrastructure.

tests/integration/test_lists/test-db/l0_l40s.yml (1)

28-28: LGTM! Test entry is correctly configured.

The new test for image embeddings is properly added to the L40s test list with the appropriate marker filter.

Based on learnings, it's common and intentional for tests to appear in multiple test list files when they serve different purposes.

tensorrt_llm/inputs/__init__.py (1)

19-20: LGTM! Public API extension is correct.

The load_base64_image_embeds function is properly exposed in the public API alongside the existing load_image and load_video utilities. The import and __all__ export are consistent with the codebase conventions.

Also applies to: 61-61

tests/integration/defs/test_e2e.py (1)

1690-1696: LGTM! Marker filtering is correctly applied.

The test invocation now explicitly excludes tests marked with needs_l40s using the -m not needs_l40s filter. This allows the multimodal chat tests to run selectively based on available hardware.

tensorrt_llm/serve/openai_server.py (1)

529-540: LGTM! Embedding extraction logic is correct.

The code properly separates directly provided embeddings from regular multimodal data:

  1. Uses MM_EMBEDDING_MAP to identify embedding tags
  2. Pops embeddings into a separate dictionary
  3. Sets multi_modal_data and multi_modal_embeddings appropriately
  4. Validates that both types aren't mixed

The ValueError check at line 540 is reachable and necessary - it catches cases where the input contains both regular multimodal data (e.g., image URLs) and direct embeddings, which is an unsupported scenario.

Note: The static analysis hints (TRY301, TRY003) suggest stylistic improvements to error handling, but these are minor and don't affect correctness.

tests/unittest/llmapi/apps/_test_openai_mmencoder.py (1)

101-103: LGTM! Type annotations and return value improve test reusability.

The function signature now explicitly declares parameters and return type, and the return statement makes the multimodal encoder output (messages and mm_handle) available for use by other tests. This is a good practice for test fixtures that need to share data.

Also applies to: 140-141

tensorrt_llm/inputs/utils.py (2)

116-122: LGTM! Secure tensor loading implementation.

The function correctly uses weights_only=True to mitigate arbitrary code execution risks from untrusted pickle data, and map_location="cpu" ensures device-agnostic loading.


477-488: LGTM! Clean separation of media type from modality.

The keyword-only modality parameter allows callers to specify a different modality for placeholder resolution while still grouping data by media_type. This enables image embeddings to use the "image" modality placeholder while being tracked separately.

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)

172-227: Well-structured test with appropriate xfail handling.

The test correctly validates image embeddings flow and gracefully handles models that don't yet implement attach_multimodal_embeddings via pytest.xfail.

Regarding the pickle usage at line 221: while static analysis flags this (S301), it's acceptable here since the file path is generated by the controlled test patch infrastructure, not from untrusted external input.


139-148: Stub fixtures to satisfy dependency requirements.

These fixtures provide the required parameters for mm_encoder_server. Since extra_encoder_options returns False, the dummy path won't be used.

tensorrt_llm/serve/chat_utils.py (4)

37-48: LGTM! Well-defined TypedDict structures for image embeddings.

The types follow the existing patterns in this file and provide clear documentation for the expected structure.


84-93: LGTM! Consistent parser and modality mapping.

The image_embeds parser follows the same pattern as other entries, and MM_EMBEDDING_MAP provides a clean abstraction for mapping embedding types to their corresponding modalities.


146-157: Consistent with existing exception handling pattern.

The broad Exception catch (line 152) matches the pattern used in image_url, video_url, and audio_url handlers in this file. While catching specific exceptions would be preferable, changing the pattern here would create inconsistency.

The modality is correctly set to "image_embeds" for tracking, which will later be mapped to "image" via MM_EMBEDDING_MAP for placeholder resolution.


284-287: LGTM! Correct modality propagation for embeddings.

The MM_EMBEDDING_MAP.get() lookup with None default ensures that:

  • image_embeds → uses "image" modality for placeholder resolution
  • Other types → fall back to their own modality (via add_data's default behavior)

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27141 [ run ] triggered by Bot. Commit: 8a35e18

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27141 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20710 completed with status: 'FAILURE'

@ixlmar
Copy link
Collaborator Author

ixlmar commented Dec 8, 2025

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27282 [ run ] triggered by Bot. Commit: 8a35e18

@tensorrt-cicd
Copy link
Collaborator

PR_Github #27282 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20833 completed with status: 'SUCCESS'

@ixlmar ixlmar enabled auto-merge (squash) January 8, 2026 12:37
@ixlmar ixlmar disabled auto-merge January 8, 2026 12:37
@ixlmar ixlmar force-pushed the feat/image-embeds branch from 3fcacce to 5ff60b0 Compare January 8, 2026 14:13
@ixlmar
Copy link
Collaborator Author

ixlmar commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31073 [ run ] triggered by Bot. Commit: 5ff60b0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31073 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 6 AM PST on 12/29.

@ixlmar
Copy link
Collaborator Author

ixlmar commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31075 [ run ] triggered by Bot. Commit: 5ff60b0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31075 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31027 [ run ] completed with state FAILURE. Commit: 3fcacce
/LLM/main/L0_MergeRequest_PR pipeline #23972 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@ixlmar
Copy link
Collaborator Author

ixlmar commented Jan 8, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31087 [ run ] triggered by Bot. Commit: 5ff60b0

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31087 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

@ixlmar ixlmar force-pushed the feat/image-embeds branch from 5ff60b0 to ab2f9fb Compare January 9, 2026 07:55
@ixlmar
Copy link
Collaborator Author

ixlmar commented Jan 9, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31224 [ run ] triggered by Bot. Commit: ab2f9fb

@ixlmar
Copy link
Collaborator Author

ixlmar commented Jan 9, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31279 [ run ] triggered by Bot. Commit: ab2f9fb

@tensorrt-cicd
Copy link
Collaborator

PR_Github #31279 [ run ] completed with state ABORTED. Commit: ab2f9fb
LLM/main/L0_MergeRequest_PR #24173 (Blue Ocean) completed with status: ABORTED

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants