[TRTLLM-9522][feat] support image_embeds in OpenAI API #9715

ixlmar · 2025-12-04T16:38:26Z

Description

This PR implements functionality analogous to vLLM's "image_embeds".

Test Coverage

Unit test added.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

Release Notes

New Features
- Added image embeddings support as an input modality, enabling users to supply base64-encoded tensor embeddings in chat requests.
Documentation
- Added documentation for image embeddings modality coverage with JSON examples and guidance on tensor embedding generation.
Tests
- Added test marker for L40s-specific tests.
- Added new test coverage for image embeddings in chat completions.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

ixlmar · 2025-12-04T16:41:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-04T16:47:19Z

PR_Github #26996 [ run ] triggered by Bot. Commit: 1352371

tensorrt-cicd · 2025-12-04T22:11:14Z

PR_Github #26996 [ run ] completed with state SUCCESS. Commit: 1352371
/LLM/main/L0_MergeRequest_PR pipeline #20582 completed with status: 'FAILURE'

ixlmar · 2025-12-05T12:55:05Z

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

tensorrt-cicd · 2025-12-05T13:00:32Z

PR_Github #27124 [ run ] triggered by Bot. Commit: 703df33

ixlmar · 2025-12-05T14:17:31Z

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

tensorrt-cicd · 2025-12-05T14:23:24Z

PR_Github #27130 [ run ] triggered by Bot. Commit: ef89a3b

tensorrt-cicd · 2025-12-05T14:23:27Z

PR_Github #27124 [ run ] completed with state ABORTED. Commit: 703df33
/LLM/main/L0_MergeRequest_PR pipeline #20694 (Partly Tested) completed with status: 'FAILURE'

ixlmar · 2025-12-05T15:35:45Z

/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2"

tensorrt-cicd · 2025-12-05T15:41:13Z

PR_Github #27134 [ run ] triggered by Bot. Commit: 8a35e18

tensorrt-cicd · 2025-12-05T15:41:15Z

PR_Github #27130 [ run ] completed with state ABORTED. Commit: ef89a3b
LLM/main/L0_MergeRequest_PR #20700 (Blue Ocean) completed with status: ABORTED

tensorrt-cicd · 2025-12-05T17:03:30Z

PR_Github #27134 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20703 (Partly Tested) completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

ixlmar · 2025-12-05T17:04:35Z

/bot run --disable-fail-fast

coderabbitai · 2025-12-05T17:09:08Z

📝 Walkthrough

Walkthrough

Adds image embeddings support to the TensorRT LLM serving stack by introducing new types for embedding content, parsers for extracting base64-encoded embeddings, multimodal data tracker enhancements to track modality separately from media type, and corresponding chat API and server-side handling to process and propagate embeddings through the message pipeline.

Changes

Cohort / File(s)	Summary
Test Framework Setup `cpp/kernels/fmha_v2/pytest.ini`	Adds `needs_l40s` pytest marker for categorizing tests requiring L40s hardware.
Input/Utilities API `tensorrt_llm/inputs/__init__.py`, `tensorrt_llm/inputs/utils.py`	Exports new `load_base64_image_embeds` function to decode base64 image embeddings. Updates `MultimodalDataTracker.add_data` to accept optional keyword-only `modality` parameter, separating modality tracking from media type.
Chat & Serve Layer `tensorrt_llm/serve/chat_utils.py`	Introduces `ChatCompletionContentPartImageEmbedsParam` type and `ImageEmbedsData` for embedding content. Adds `_ImageEmbedsParser` and extends `MM_PARSER_MAP` to parse image embeddings. Adds `MM_EMBEDDING_MAP` to map embedding types to modalities. Updates message parsing signatures and adds base64 embedding decoding via async coroutines.
OpenAI Server Integration `tensorrt_llm/serve/openai_server.py`	Imports `MM_EMBEDDING_MAP` and integrates embedding extraction in chat response flow, separating embeddings from multimodal data and validating they don't coexist.
Documentation `docs/source/commands/trtllm-serve/trtllm-serve.rst`	Adds documentation section for image embeddings modality support with JSON example and base64 encoding guidance.
Integration Test Configuration `tests/integration/defs/test_e2e.py`, `tests/integration/test_lists/test-db/l0_l40s.yml`	Updates test execution to exclude `needs_l40s` marked tests. Adds new `test_single_chat_session_image_embeds` to L40s test list.
Multimodal Test Utilities `tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py`	Introduces test-time patch module implementing `_attach_multimodal_embeddings` to serialize embeddings for testing and attach to `Qwen2VLInputProcessorBase`.
Multimodal Chat Tests `tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py`, `tests/unittest/llmapi/apps/_test_openai_mmencoder.py`	Adds new test fixtures for multimodal encoder server integration and `test_single_chat_session_image_embeds` to verify image embeddings workflow. Updates `test_multimodal_content_mm_encoder` signature with explicit return type and return statement.

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ChatUtils
    participant InputUtils
    participant Model

    Client->>OpenAIServer: POST /chat/completions with image_embeds
    OpenAIServer->>ChatUtils: parse_chat_messages_coroutines(messages)
    
    ChatUtils->>ChatUtils: Detect "image_embeds" content type
    ChatUtils->>ChatUtils: _ImageEmbedsParser extracts image_embeds.data
    ChatUtils->>InputUtils: load_base64_image_embeds(base64_str)
    InputUtils->>InputUtils: Decode base64 → torch.Tensor
    InputUtils-->>ChatUtils: Tensor coroutine
    
    ChatUtils->>ChatUtils: MM_EMBEDDING_MAP: "image_embeds" → "image"
    ChatUtils->>ChatUtils: MultimodalDataTracker.add_data(media_type, coroutine, modality="image")
    ChatUtils-->>OpenAIServer: MultimodalData with modality="image"
    
    OpenAIServer->>OpenAIServer: Extract embeddings via MM_EMBEDDING_MAP
    OpenAIServer->>OpenAIServer: Separate multi_modal_data vs multi_modal_embeddings
    OpenAIServer->>Model: Pass embeddings to model inference
    
    Model-->>OpenAIServer: Response
    OpenAIServer-->>Client: Chat completion response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Key areas requiring attention during review:

tensorrt_llm/serve/chat_utils.py: Extensive type additions and parsing logic; verify new parser correctly handles base64 decoding and modality mapping aligns across all code paths
MultimodalDataTracker.add_data signature change: Ensure all callers properly pass modality where needed and existing code without modality still behaves correctly
MM_EMBEDDING_MAP and MM_PARSER_MAP consistency: Verify mappings are aligned and no embedding types are handled inconsistently
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py: New test integration with mm_encoder_server; verify fixture setup and error handling (BadRequestError fallback path)
Backward compatibility: Check that changes to function signatures in chat_utils.py don't break existing callers

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main feature being implemented: support for image_embeds in the OpenAI API, directly matching the core changes.
Description check	✅ Passed	The description explains the implementation rationale (vLLM feature parity), mentions test coverage (unit test added), and includes a comprehensive checklist with all items reviewed and checked.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)

29-44: Unconventional test pattern - verify this is the intended design.

This helper monkey-patches Qwen2VLInputProcessorBase to serialize embeddings to a pickle file, then raises a ValueError with the file path. While functional, this approach has some concerns:

Using exceptions for control flow: Raising ValueError to communicate the file path is unconventional. Consider whether the test could be structured differently to return the path normally.

Temporary file cleanup: The pickle file in the system temp directory may not be cleaned up automatically. Consider using tempfile.NamedTemporaryFile with explicit cleanup.

Test isolation: Monkey-patching a global class can affect other tests. Ensure this import only occurs in isolated test contexts.

If this pattern is intentional and aligns with your testing strategy, consider adding a comment explaining why exception-based communication is used here.

Note: Static analysis warnings about unused arguments and setattr are expected for this monkey-patching pattern and can be safely ignored.
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)
40-57: Temporary directory not fully cleaned up.

tempfile.mkdtemp() creates a directory that should be removed after the test. Currently, only the file is removed but the parent directory created at line 41 remains.

Consider using tempfile.TemporaryDirectory as a context manager or cleaning up the directory:
 @pytest.fixture(scope="module")
 def temp_extra_llm_api_options_file(request):
-    temp_dir = tempfile.mkdtemp()
+    temp_dir = tempfile.TemporaryDirectory()
-    temp_file_path = os.path.join(temp_dir, "extra_llm_api_options.yaml")
+    temp_file_path = os.path.join(temp_dir.name, "extra_llm_api_options.yaml")
     try:
         extra_llm_api_options_dict = {
             "kv_cache_config": {
                 "enable_block_reuse": False,
                 "free_gpu_memory_fraction": 0.6,
             },
         }

         with open(temp_file_path, 'w') as f:
             yaml.dump(extra_llm_api_options_dict, f)

         yield temp_file_path
     finally:
-        if os.path.exists(temp_file_path):
-            os.remove(temp_file_path)
+        temp_dir.cleanup()
26-31: Consider using __all__ or a comment instead of assertion for fixture visibility.

The assertion at line 31 works but is unconventional. A clearer approach might be a comment explaining the re-export or using __all__.
 from ._test_openai_mmencoder import RemoteMMEncoderServer
 from ._test_openai_mmencoder import server as mm_encoder_server
 from ._test_openai_mmencoder import \
     test_multimodal_content_mm_encoder as _test_multimodal_content_mm_encoder

-assert mm_encoder_server is not None  # keep 'mm_encoder_server' fixture visible in this module
+# Re-exported fixtures for pytest discovery
+__all__ = ["mm_encoder_server"]

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68253d9 and 8a35e18.

📒 Files selected for processing (11)

cpp/kernels/fmha_v2/pytest.ini (1 hunks)
docs/source/commands/trtllm-serve/trtllm-serve.rst (1 hunks)
tensorrt_llm/inputs/__init__.py (2 hunks)
tensorrt_llm/inputs/utils.py (2 hunks)
tensorrt_llm/serve/chat_utils.py (7 hunks)
tensorrt_llm/serve/openai_server.py (2 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/test-db/l0_l40s.yml (1 hunks)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (4 hunks)
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., use from package.subpackage import foo and then foo.SomeClass() instead of from package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g., some_file.py)
Python class names should use PascalCase (e.g., class SomeClass)
Python function and method names should use snake_case (e.g., def my_awesome_function():)
Python local variable names should use snake_case, with prefix k for variable names that start with a number (e.g., k_99th_percentile = ...)
Python global variables should use upper snake_case with prefix G (e.g., G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g., self.x = 5 followed by """<type>: Description of 'x'""" )
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic

Files:

tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/inputs/utils.py
tests/unittest/llmapi/apps/_test_openai_mmencoder.py
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
tensorrt_llm/serve/chat_utils.py
tensorrt_llm/serve/openai_server.py

**/*.{cpp,h,cu,py}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top

Files:

tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
tensorrt_llm/inputs/__init__.py
tensorrt_llm/inputs/utils.py
tests/unittest/llmapi/apps/_test_openai_mmencoder.py
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
tensorrt_llm/serve/chat_utils.py
tensorrt_llm/serve/openai_server.py

🧠 Learnings (4)

📚 Learning: 2025-09-09T09:40:45.658Z

Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.

Applied to files:

tests/integration/test_lists/test-db/l0_l40s.yml

📚 Learning: 2025-07-28T17:06:08.621Z

Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tests/integration/test_lists/test-db/l0_l40s.yml
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py

📚 Learning: 2025-08-29T14:07:45.863Z

Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.

Applied to files:

tests/integration/defs/test_e2e.py
cpp/kernels/fmha_v2/pytest.ini

📚 Learning: 2025-09-09T13:33:32.247Z

Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7497
File: tests/integration/defs/test_list_parser.py:0-0
Timestamp: 2025-09-09T13:33:32.247Z
Learning: The ISOLATE marker in test_list_parser.py is used for test list parsing and Jenkins pipeline logic, not for creating pytest markers. It allows the parser to recognize ISOLATE markers in test list files, which the Jenkins pipeline then uses to separate tests that need isolation, but no pytest.mark.isolate marker is created.

Applied to files:

cpp/kernels/fmha_v2/pytest.ini

🧬 Code graph analysis (8)

tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)

tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)

Qwen2VLInputProcessorBase (91-403)

tensorrt_llm/inputs/__init__.py (1)

tensorrt_llm/inputs/utils.py (4)

get_cache_salt_id (787-795)

load_base64_image_embeds (116-122)

load_image (125-146)

load_video (266-287)

tensorrt_llm/inputs/utils.py (1)

tensorrt_llm/_torch/auto_deploy/utils/pattern_matcher.py (1)

placeholder (229-247)

tests/unittest/llmapi/apps/_test_openai_mmencoder.py (2)

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)

client (76-77)

model_name (35-36)

tensorrt_llm/llmapi/llm_args.py (1)

model_name (1687-1688)

tests/integration/defs/test_e2e.py (1)

tests/integration/defs/conftest.py (2)

llm_venv (713-730)

test_root (2508-2509)

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (1)

tests/unittest/llmapi/apps/_test_openai_mmencoder.py (7)

RemoteMMEncoderServer (16-44)

server (79-88)

test_multimodal_content_mm_encoder (101-140)

extra_encoder_options (55-56)

temp_extra_encoder_options_file (60-75)

model_name (48-49)

client (92-93)

tensorrt_llm/serve/chat_utils.py (1)

tensorrt_llm/inputs/utils.py (5)

async_load_image (149-173)

async_load_video (290-319)

load_base64_image_embeds (116-122)

MultimodalData (430-433)

add_data (477-488)

tensorrt_llm/serve/openai_server.py (1)

tensorrt_llm/serve/chat_utils.py (1)

load_chat_template (310-341)

🪛 Ruff (0.14.7)

tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py

31-31: Unused function argument: self

(ARG001)

32-32: Unused function argument: inputs

(ARG001)

34-34: Unused function argument: sampling_params

(ARG001)

44-44: Do not call setattr with a constant attribute value. It is not any safer than normal property access.

Replace setattr with assignment

(B010)

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py

221-221: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue

(S301)

tensorrt_llm/serve/chat_utils.py

152-152: Do not catch blind exception: Exception

(BLE001)

153-153: Use explicit conversion flag

Replace with conversion flag

(RUF010)

tensorrt_llm/serve/openai_server.py

540-540: Abstract raise to an inner function

(TRY301)

540-540: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (15)

docs/source/commands/trtllm-serve/trtllm-serve.rst (1)

155-172: LGTM! Clear documentation for the new image embeddings feature.

The documentation clearly explains how to provide image embeddings directly via image_embeds, includes a helpful JSON example, and provides guidance on generating the base64-encoded content using torch.save.

cpp/kernels/fmha_v2/pytest.ini (1)

9-9: LGTM! Marker addition is correct.

The needs_l40s marker is properly declared and will be used to categorize tests that require L40s hardware for execution.

Based on learnings, pytest markers are correctly honored in TensorRT-LLM's CI infrastructure.

tests/integration/test_lists/test-db/l0_l40s.yml (1)

28-28: LGTM! Test entry is correctly configured.

The new test for image embeddings is properly added to the L40s test list with the appropriate marker filter.

Based on learnings, it's common and intentional for tests to appear in multiple test list files when they serve different purposes.

tensorrt_llm/inputs/__init__.py (1)

19-20: LGTM! Public API extension is correct.

The load_base64_image_embeds function is properly exposed in the public API alongside the existing load_image and load_video utilities. The import and __all__ export are consistent with the codebase conventions.

Also applies to: 61-61

tests/integration/defs/test_e2e.py (1)

1690-1696: LGTM! Marker filtering is correctly applied.

The test invocation now explicitly excludes tests marked with needs_l40s using the -m not needs_l40s filter. This allows the multimodal chat tests to run selectively based on available hardware.

tensorrt_llm/serve/openai_server.py (1)

529-540: LGTM! Embedding extraction logic is correct.

The code properly separates directly provided embeddings from regular multimodal data:

Uses MM_EMBEDDING_MAP to identify embedding tags

Pops embeddings into a separate dictionary

Sets multi_modal_data and multi_modal_embeddings appropriately

Validates that both types aren't mixed

The ValueError check at line 540 is reachable and necessary - it catches cases where the input contains both regular multimodal data (e.g., image URLs) and direct embeddings, which is an unsupported scenario.

Note: The static analysis hints (TRY301, TRY003) suggest stylistic improvements to error handling, but these are minor and don't affect correctness.

tests/unittest/llmapi/apps/_test_openai_mmencoder.py (1)

101-103: LGTM! Type annotations and return value improve test reusability.

The function signature now explicitly declares parameters and return type, and the return statement makes the multimodal encoder output (messages and mm_handle) available for use by other tests. This is a good practice for test fixtures that need to share data.

Also applies to: 140-141

tensorrt_llm/inputs/utils.py (2)

116-122: LGTM! Secure tensor loading implementation.

The function correctly uses weights_only=True to mitigate arbitrary code execution risks from untrusted pickle data, and map_location="cpu" ensures device-agnostic loading.

477-488: LGTM! Clean separation of media type from modality.

The keyword-only modality parameter allows callers to specify a different modality for placeholder resolution while still grouping data by media_type. This enables image embeddings to use the "image" modality placeholder while being tracked separately.

tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)

172-227: Well-structured test with appropriate xfail handling.

The test correctly validates image embeddings flow and gracefully handles models that don't yet implement attach_multimodal_embeddings via pytest.xfail.

Regarding the pickle usage at line 221: while static analysis flags this (S301), it's acceptable here since the file path is generated by the controlled test patch infrastructure, not from untrusted external input.

139-148: Stub fixtures to satisfy dependency requirements.

These fixtures provide the required parameters for mm_encoder_server. Since extra_encoder_options returns False, the dummy path won't be used.

tensorrt_llm/serve/chat_utils.py (4)

37-48: LGTM! Well-defined TypedDict structures for image embeddings.

The types follow the existing patterns in this file and provide clear documentation for the expected structure.

84-93: LGTM! Consistent parser and modality mapping.

The image_embeds parser follows the same pattern as other entries, and MM_EMBEDDING_MAP provides a clean abstraction for mapping embedding types to their corresponding modalities.

146-157: Consistent with existing exception handling pattern.

The broad Exception catch (line 152) matches the pattern used in image_url, video_url, and audio_url handlers in this file. While catching specific exceptions would be preferable, changing the pattern here would create inconsistency.

The modality is correctly set to "image_embeds" for tracking, which will later be mapped to "image" via MM_EMBEDDING_MAP for placeholder resolution.

284-287: LGTM! Correct modality propagation for embeddings.

The MM_EMBEDDING_MAP.get() lookup with None default ensures that:

image_embeds → uses "image" modality for placeholder resolution

Other types → fall back to their own modality (via add_data's default behavior)

tensorrt-cicd · 2025-12-05T17:10:28Z

PR_Github #27141 [ run ] triggered by Bot. Commit: 8a35e18

tensorrt-cicd · 2025-12-05T22:18:46Z

PR_Github #27141 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20710 completed with status: 'FAILURE'

ixlmar · 2025-12-08T09:03:27Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-12-08T09:08:55Z

PR_Github #27282 [ run ] triggered by Bot. Commit: 8a35e18

tensorrt-cicd · 2025-12-08T17:01:20Z

PR_Github #27282 [ run ] completed with state SUCCESS. Commit: 8a35e18
/LLM/main/L0_MergeRequest_PR pipeline #20833 completed with status: 'SUCCESS'

ixlmar · 2026-01-08T14:13:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T14:19:44Z

PR_Github #31073 [ run ] triggered by Bot. Commit: 5ff60b0

tensorrt-cicd · 2026-01-08T14:19:45Z

PR_Github #31073 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 6 AM PST on 12/29.

ixlmar · 2026-01-08T14:25:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T14:30:48Z

PR_Github #31075 [ run ] triggered by Bot. Commit: 5ff60b0

tensorrt-cicd · 2026-01-08T14:30:49Z

PR_Github #31075 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

tensorrt-cicd · 2026-01-08T14:48:29Z

PR_Github #31027 [ run ] completed with state FAILURE. Commit: 3fcacce
/LLM/main/L0_MergeRequest_PR pipeline #23972 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

ixlmar · 2026-01-08T16:16:30Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-08T16:22:59Z

PR_Github #31087 [ run ] triggered by Bot. Commit: 5ff60b0

tensorrt-cicd · 2026-01-08T16:23:00Z

PR_Github #31087 [ run ] completed with state DISABLED
CI server is currently disabled for scheduled maintenance. Estimated completion time: 9 AM PST on 1/8.

Signed-off-by: ixlmar <[email protected]>

ixlmar · 2026-01-09T08:03:19Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-09T08:33:06Z

PR_Github #31224 [ run ] triggered by Bot. Commit: ab2f9fb

ixlmar · 2026-01-09T16:44:04Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-01-09T16:50:43Z

PR_Github #31279 [ run ] triggered by Bot. Commit: ab2f9fb

tensorrt-cicd · 2026-01-10T16:51:32Z

PR_Github #31279 [ run ] completed with state ABORTED. Commit: ab2f9fb
LLM/main/L0_MergeRequest_PR #24173 (Blue Ocean) completed with status: ABORTED

ixlmar force-pushed the feat/image-embeds branch from dde2050 to 1352371 Compare December 4, 2025 16:40

ixlmar requested review from chang-l and evezhier December 4, 2025 16:40

ixlmar requested a review from MartinMarciniszyn December 4, 2025 16:44

ixlmar force-pushed the feat/image-embeds branch from 650abcb to 703df33 Compare December 5, 2025 12:54

ixlmar force-pushed the feat/image-embeds branch from 079ae0d to ef89a3b Compare December 5, 2025 14:06

ixlmar marked this pull request as ready for review December 5, 2025 17:04

ixlmar requested a review from a team as a code owner December 5, 2025 17:04

ixlmar requested review from chzblych and kaiyux December 5, 2025 17:04

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

ixlmar enabled auto-merge (squash) January 8, 2026 12:37

ixlmar disabled auto-merge January 8, 2026 12:37

ixlmar force-pushed the feat/image-embeds branch from 3fcacce to 5ff60b0 Compare January 8, 2026 14:13

ixlmar added 11 commits January 9, 2026 08:55

feat: support image_embeds in OpenAI API

e1d4894

Signed-off-by: ixlmar <[email protected]>

fix: run test_single_chat_session_image_embeds on L40S

6f87b01

Signed-off-by: ixlmar <[email protected]>

add nested "data"

4e0b5b5

Signed-off-by: ixlmar <[email protected]>

do not run 'test_multimodal_content_mm_encoder' twice

fce90e8

Signed-off-by: ixlmar <[email protected]>

fix: update docs

6b3b0c9

Signed-off-by: ixlmar <[email protected]>

chore: add is_embedding to MultimodalData

12ef1d3

Signed-off-by: ixlmar <[email protected]>

chore: use torch.save/load

c9055b5

Signed-off-by: ixlmar <[email protected]>

address remaining review comments

754aa79

Signed-off-by: ixlmar <[email protected]>

chore: refine MultimodalDataTracker.add_data

3c13578

Signed-off-by: ixlmar <[email protected]>

fix: conform to upstream

6ded5b1

Signed-off-by: ixlmar <[email protected]>

fix: remove unused import

ab2f9fb

Signed-off-by: ixlmar <[email protected]>

ixlmar force-pushed the feat/image-embeds branch from 5ff60b0 to ab2f9fb Compare January 9, 2026 07:55

[TRTLLM-9522][feat] support image_embeds in OpenAI API #9715

Are you sure you want to change the base?

[TRTLLM-9522][feat] support image_embeds in OpenAI API #9715

Uh oh!

Conversation

ixlmar commented Dec 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Summary by CodeRabbit

Release Notes

Uh oh!

ixlmar commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

tensorrt-cicd commented Dec 4, 2025

Uh oh!

ixlmar commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

ixlmar commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

ixlmar commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

ixlmar commented Dec 5, 2025

Uh oh!

coderabbitai bot commented Dec 5, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

tensorrt-cicd commented Dec 5, 2025

Uh oh!

ixlmar commented Dec 8, 2025

Uh oh!

tensorrt-cicd commented Dec 8, 2025

Uh oh!

tensorrt-cicd commented Dec 8, 2025

Uh oh!

ixlmar commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

ixlmar commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

Uh oh!

ixlmar commented Jan 8, 2026

Uh oh!

tensorrt-cicd commented Jan 8, 2026

ixlmar commented Dec 4, 2025 •

edited by coderabbitai bot

Loading