-
Notifications
You must be signed in to change notification settings - Fork 2k
[TRTLLM-9522][feat] support image_embeds in OpenAI API #9715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
dde2050 to
1352371
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #26996 [ run ] triggered by Bot. Commit: |
|
PR_Github #26996 [ run ] completed with state |
650abcb to
703df33
Compare
|
/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2" |
|
PR_Github #27124 [ run ] triggered by Bot. Commit: |
079ae0d to
ef89a3b
Compare
|
/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2" |
|
PR_Github #27130 [ run ] triggered by Bot. Commit: |
|
PR_Github #27124 [ run ] completed with state |
|
/bot run --disable-fail-fast --stage-list "A10-PyTorch-1,A10-PyTorch-2,L40S-PyTorch-1,L40S-PyTorch-2" |
|
PR_Github #27134 [ run ] triggered by Bot. Commit: |
|
PR_Github #27130 [ run ] completed with state |
|
PR_Github #27134 [ run ] completed with state |
|
/bot run --disable-fail-fast |
📝 WalkthroughWalkthroughAdds image embeddings support to the TensorRT LLM serving stack by introducing new types for embedding content, parsers for extracting base64-encoded embeddings, multimodal data tracker enhancements to track modality separately from media type, and corresponding chat API and server-side handling to process and propagate embeddings through the message pipeline. Changes
Sequence DiagramsequenceDiagram
participant Client
participant OpenAIServer
participant ChatUtils
participant InputUtils
participant Model
Client->>OpenAIServer: POST /chat/completions with image_embeds
OpenAIServer->>ChatUtils: parse_chat_messages_coroutines(messages)
ChatUtils->>ChatUtils: Detect "image_embeds" content type
ChatUtils->>ChatUtils: _ImageEmbedsParser extracts image_embeds.data
ChatUtils->>InputUtils: load_base64_image_embeds(base64_str)
InputUtils->>InputUtils: Decode base64 → torch.Tensor
InputUtils-->>ChatUtils: Tensor coroutine
ChatUtils->>ChatUtils: MM_EMBEDDING_MAP: "image_embeds" → "image"
ChatUtils->>ChatUtils: MultimodalDataTracker.add_data(media_type, coroutine, modality="image")
ChatUtils-->>OpenAIServer: MultimodalData with modality="image"
OpenAIServer->>OpenAIServer: Extract embeddings via MM_EMBEDDING_MAP
OpenAIServer->>OpenAIServer: Separate multi_modal_data vs multi_modal_embeddings
OpenAIServer->>Model: Pass embeddings to model inference
Model-->>OpenAIServer: Response
OpenAIServer-->>Client: Chat completion response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Key areas requiring attention during review:
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)
29-44: Unconventional test pattern - verify this is the intended design.This helper monkey-patches
Qwen2VLInputProcessorBaseto serialize embeddings to a pickle file, then raises aValueErrorwith the file path. While functional, this approach has some concerns:
Using exceptions for control flow: Raising
ValueErrorto communicate the file path is unconventional. Consider whether the test could be structured differently to return the path normally.Temporary file cleanup: The pickle file in the system temp directory may not be cleaned up automatically. Consider using
tempfile.NamedTemporaryFilewith explicit cleanup.Test isolation: Monkey-patching a global class can affect other tests. Ensure this import only occurs in isolated test contexts.
If this pattern is intentional and aligns with your testing strategy, consider adding a comment explaining why exception-based communication is used here.
Note: Static analysis warnings about unused arguments and
setattrare expected for this monkey-patching pattern and can be safely ignored.tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)
40-57: Temporary directory not fully cleaned up.
tempfile.mkdtemp()creates a directory that should be removed after the test. Currently, only the file is removed but the parent directory created at line 41 remains.Consider using
tempfile.TemporaryDirectoryas a context manager or cleaning up the directory:@pytest.fixture(scope="module") def temp_extra_llm_api_options_file(request): - temp_dir = tempfile.mkdtemp() + temp_dir = tempfile.TemporaryDirectory() - temp_file_path = os.path.join(temp_dir, "extra_llm_api_options.yaml") + temp_file_path = os.path.join(temp_dir.name, "extra_llm_api_options.yaml") try: extra_llm_api_options_dict = { "kv_cache_config": { "enable_block_reuse": False, "free_gpu_memory_fraction": 0.6, }, } with open(temp_file_path, 'w') as f: yaml.dump(extra_llm_api_options_dict, f) yield temp_file_path finally: - if os.path.exists(temp_file_path): - os.remove(temp_file_path) + temp_dir.cleanup()
26-31: Consider using__all__or a comment instead of assertion for fixture visibility.The assertion at line 31 works but is unconventional. A clearer approach might be a comment explaining the re-export or using
__all__.from ._test_openai_mmencoder import RemoteMMEncoderServer from ._test_openai_mmencoder import server as mm_encoder_server from ._test_openai_mmencoder import \ test_multimodal_content_mm_encoder as _test_multimodal_content_mm_encoder -assert mm_encoder_server is not None # keep 'mm_encoder_server' fixture visible in this module +# Re-exported fixtures for pytest discovery +__all__ = ["mm_encoder_server"]
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
cpp/kernels/fmha_v2/pytest.ini(1 hunks)docs/source/commands/trtllm-serve/trtllm-serve.rst(1 hunks)tensorrt_llm/inputs/__init__.py(2 hunks)tensorrt_llm/inputs/utils.py(2 hunks)tensorrt_llm/serve/chat_utils.py(7 hunks)tensorrt_llm/serve/openai_server.py(2 hunks)tests/integration/defs/test_e2e.py(1 hunks)tests/integration/test_lists/test-db/l0_l40s.yml(1 hunks)tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py(4 hunks)tests/unittest/llmapi/apps/_test_openai_mmencoder.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+
Indent Python code with 4 spaces; do not use tabs
Always maintain the namespace when importing in Python, even if only one class or function from a module is used (e.g., usefrom package.subpackage import fooand thenfoo.SomeClass()instead offrom package.subpackage.foo import SomeClass)
Python filenames should use snake_case (e.g.,some_file.py)
Python class names should use PascalCase (e.g.,class SomeClass)
Python function and method names should use snake_case (e.g.,def my_awesome_function():)
Python local variable names should use snake_case, with prefixkfor variable names that start with a number (e.g.,k_99th_percentile = ...)
Python global variables should use upper snake_case with prefixG(e.g.,G_MY_GLOBAL = ...)
Python constants should use upper snake_case (e.g.,MY_CONSTANT = ...)
Avoid shadowing variables declared in an outer scope in Python
Initialize all externally visible members of a Python class in the constructor
For Python interfaces that may be used outside a file, prefer docstrings over comments
Python comments should be reserved for code within a function, or interfaces that are local to a file
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx
Python attributes and variables can be documented inline with type and description (e.g.,self.x = 5followed by"""<type>: Description of 'x'""")
Avoid using reflection in Python when functionality can be easily achieved without reflection
When using try-except blocks in Python, limit the except clause to the smallest set of specific errors possible instead of catching all exceptions
When using try-except blocks in Python to handle multiple possible variable types (duck-typing), keep the body of the try as small as possible and use the else block to implement the logic
Files:
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.pytensorrt_llm/inputs/__init__.pytensorrt_llm/inputs/utils.pytests/unittest/llmapi/apps/_test_openai_mmencoder.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_openai_chat_multimodal.pytensorrt_llm/serve/chat_utils.pytensorrt_llm/serve/openai_server.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code files should contain an NVIDIA copyright header that includes the current year at the top
Files:
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.pytensorrt_llm/inputs/__init__.pytensorrt_llm/inputs/utils.pytests/unittest/llmapi/apps/_test_openai_mmencoder.pytests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_openai_chat_multimodal.pytensorrt_llm/serve/chat_utils.pytensorrt_llm/serve/openai_server.py
🧠 Learnings (4)
📚 Learning: 2025-09-09T09:40:45.658Z
Learnt from: fredricz-20070104
Repo: NVIDIA/TensorRT-LLM PR: 7645
File: tests/integration/test_lists/qa/llm_function_core.txt:648-648
Timestamp: 2025-09-09T09:40:45.658Z
Learning: In TensorRT-LLM test lists, it's common and intentional for the same test to appear in multiple test list files when they serve different purposes (e.g., llm_function_core.txt for comprehensive core functionality testing and llm_function_core_sanity.txt for quick sanity checks). This duplication allows tests to be run in different testing contexts.
Applied to files:
tests/integration/test_lists/test-db/l0_l40s.yml
📚 Learning: 2025-07-28T17:06:08.621Z
Learnt from: moraxu
Repo: NVIDIA/TensorRT-LLM PR: 6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tests/integration/test_lists/test-db/l0_l40s.ymltests/integration/defs/test_e2e.pytests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
📚 Learning: 2025-08-29T14:07:45.863Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7370
File: tests/unittest/trt/model_api/test_model_quantization.py:24-27
Timestamp: 2025-08-29T14:07:45.863Z
Learning: In TensorRT-LLM's CI infrastructure, pytest skip markers (pytest.mark.skip) are properly honored even when test files have __main__ blocks that call test functions directly. The testing system correctly skips tests without requiring modifications to the __main__ block execution pattern.
Applied to files:
tests/integration/defs/test_e2e.pycpp/kernels/fmha_v2/pytest.ini
📚 Learning: 2025-09-09T13:33:32.247Z
Learnt from: EmmaQiaoCh
Repo: NVIDIA/TensorRT-LLM PR: 7497
File: tests/integration/defs/test_list_parser.py:0-0
Timestamp: 2025-09-09T13:33:32.247Z
Learning: The ISOLATE marker in test_list_parser.py is used for test list parsing and Jenkins pipeline logic, not for creating pytest markers. It allows the parser to recognize ISOLATE markers in test list files, which the Jenkins pipeline then uses to separate tests that need isolation, but no pytest.mark.isolate marker is created.
Applied to files:
cpp/kernels/fmha_v2/pytest.ini
🧬 Code graph analysis (8)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py (1)
tensorrt_llm/_torch/models/modeling_qwen2vl.py (1)
Qwen2VLInputProcessorBase(91-403)
tensorrt_llm/inputs/__init__.py (1)
tensorrt_llm/inputs/utils.py (4)
get_cache_salt_id(787-795)load_base64_image_embeds(116-122)load_image(125-146)load_video(266-287)
tensorrt_llm/inputs/utils.py (1)
tensorrt_llm/_torch/auto_deploy/utils/pattern_matcher.py (1)
placeholder(229-247)
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (2)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)
client(76-77)model_name(35-36)tensorrt_llm/llmapi/llm_args.py (1)
model_name(1687-1688)
tests/integration/defs/test_e2e.py (1)
tests/integration/defs/conftest.py (2)
llm_venv(713-730)test_root(2508-2509)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (1)
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (7)
RemoteMMEncoderServer(16-44)server(79-88)test_multimodal_content_mm_encoder(101-140)extra_encoder_options(55-56)temp_extra_encoder_options_file(60-75)model_name(48-49)client(92-93)
tensorrt_llm/serve/chat_utils.py (1)
tensorrt_llm/inputs/utils.py (5)
async_load_image(149-173)async_load_video(290-319)load_base64_image_embeds(116-122)MultimodalData(430-433)add_data(477-488)
tensorrt_llm/serve/openai_server.py (1)
tensorrt_llm/serve/chat_utils.py (1)
load_chat_template(310-341)
🪛 Ruff (0.14.7)
tests/unittest/llmapi/apps/_attach_multimodal_embeddings_patch/__init__.py
31-31: Unused function argument: self
(ARG001)
32-32: Unused function argument: inputs
(ARG001)
34-34: Unused function argument: sampling_params
(ARG001)
44-44: Do not call setattr with a constant attribute value. It is not any safer than normal property access.
Replace setattr with assignment
(B010)
tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py
221-221: pickle and modules that wrap it can be unsafe when used to deserialize untrusted data, possible security issue
(S301)
tensorrt_llm/serve/chat_utils.py
152-152: Do not catch blind exception: Exception
(BLE001)
153-153: Use explicit conversion flag
Replace with conversion flag
(RUF010)
tensorrt_llm/serve/openai_server.py
540-540: Abstract raise to an inner function
(TRY301)
540-540: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (15)
docs/source/commands/trtllm-serve/trtllm-serve.rst (1)
155-172: LGTM! Clear documentation for the new image embeddings feature.The documentation clearly explains how to provide image embeddings directly via
image_embeds, includes a helpful JSON example, and provides guidance on generating the base64-encoded content usingtorch.save.cpp/kernels/fmha_v2/pytest.ini (1)
9-9: LGTM! Marker addition is correct.The
needs_l40smarker is properly declared and will be used to categorize tests that require L40s hardware for execution.Based on learnings, pytest markers are correctly honored in TensorRT-LLM's CI infrastructure.
tests/integration/test_lists/test-db/l0_l40s.yml (1)
28-28: LGTM! Test entry is correctly configured.The new test for image embeddings is properly added to the L40s test list with the appropriate marker filter.
Based on learnings, it's common and intentional for tests to appear in multiple test list files when they serve different purposes.
tensorrt_llm/inputs/__init__.py (1)
19-20: LGTM! Public API extension is correct.The
load_base64_image_embedsfunction is properly exposed in the public API alongside the existingload_imageandload_videoutilities. The import and__all__export are consistent with the codebase conventions.Also applies to: 61-61
tests/integration/defs/test_e2e.py (1)
1690-1696: LGTM! Marker filtering is correctly applied.The test invocation now explicitly excludes tests marked with
needs_l40susing the-m not needs_l40sfilter. This allows the multimodal chat tests to run selectively based on available hardware.tensorrt_llm/serve/openai_server.py (1)
529-540: LGTM! Embedding extraction logic is correct.The code properly separates directly provided embeddings from regular multimodal data:
- Uses
MM_EMBEDDING_MAPto identify embedding tags- Pops embeddings into a separate dictionary
- Sets
multi_modal_dataandmulti_modal_embeddingsappropriately- Validates that both types aren't mixed
The ValueError check at line 540 is reachable and necessary - it catches cases where the input contains both regular multimodal data (e.g., image URLs) and direct embeddings, which is an unsupported scenario.
Note: The static analysis hints (TRY301, TRY003) suggest stylistic improvements to error handling, but these are minor and don't affect correctness.
tests/unittest/llmapi/apps/_test_openai_mmencoder.py (1)
101-103: LGTM! Type annotations and return value improve test reusability.The function signature now explicitly declares parameters and return type, and the return statement makes the multimodal encoder output (messages and mm_handle) available for use by other tests. This is a good practice for test fixtures that need to share data.
Also applies to: 140-141
tensorrt_llm/inputs/utils.py (2)
116-122: LGTM! Secure tensor loading implementation.The function correctly uses
weights_only=Trueto mitigate arbitrary code execution risks from untrusted pickle data, andmap_location="cpu"ensures device-agnostic loading.
477-488: LGTM! Clean separation of media type from modality.The keyword-only
modalityparameter allows callers to specify a different modality for placeholder resolution while still grouping data bymedia_type. This enables image embeddings to use the"image"modality placeholder while being tracked separately.tests/unittest/llmapi/apps/_test_openai_chat_multimodal.py (2)
172-227: Well-structured test with appropriate xfail handling.The test correctly validates image embeddings flow and gracefully handles models that don't yet implement
attach_multimodal_embeddingsviapytest.xfail.Regarding the pickle usage at line 221: while static analysis flags this (S301), it's acceptable here since the file path is generated by the controlled test patch infrastructure, not from untrusted external input.
139-148: Stub fixtures to satisfy dependency requirements.These fixtures provide the required parameters for
mm_encoder_server. Sinceextra_encoder_optionsreturnsFalse, the dummy path won't be used.tensorrt_llm/serve/chat_utils.py (4)
37-48: LGTM! Well-defined TypedDict structures for image embeddings.The types follow the existing patterns in this file and provide clear documentation for the expected structure.
84-93: LGTM! Consistent parser and modality mapping.The
image_embedsparser follows the same pattern as other entries, andMM_EMBEDDING_MAPprovides a clean abstraction for mapping embedding types to their corresponding modalities.
146-157: Consistent with existing exception handling pattern.The broad
Exceptioncatch (line 152) matches the pattern used inimage_url,video_url, andaudio_urlhandlers in this file. While catching specific exceptions would be preferable, changing the pattern here would create inconsistency.The modality is correctly set to
"image_embeds"for tracking, which will later be mapped to"image"viaMM_EMBEDDING_MAPfor placeholder resolution.
284-287: LGTM! Correct modality propagation for embeddings.The
MM_EMBEDDING_MAP.get()lookup withNonedefault ensures that:
image_embeds→ uses"image"modality for placeholder resolution- Other types → fall back to their own modality (via
add_data's default behavior)
|
PR_Github #27141 [ run ] triggered by Bot. Commit: |
|
PR_Github #27141 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #27282 [ run ] triggered by Bot. Commit: |
|
PR_Github #27282 [ run ] completed with state |
3fcacce to
5ff60b0
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #31073 [ run ] triggered by Bot. Commit: |
|
PR_Github #31073 [ run ] completed with state |
|
/bot run --disable-fail-fast |
|
PR_Github #31075 [ run ] triggered by Bot. Commit: |
|
PR_Github #31075 [ run ] completed with state |
|
PR_Github #31027 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #31087 [ run ] triggered by Bot. Commit: |
|
PR_Github #31087 [ run ] completed with state |
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
Signed-off-by: ixlmar <[email protected]>
5ff60b0 to
ab2f9fb
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #31224 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #31279 [ run ] triggered by Bot. Commit: |
|
PR_Github #31279 [ run ] completed with state |
Description
This PR implements functionality analogous to vLLM's "image_embeds".
Test Coverage
Unit test added.
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.
Summary by CodeRabbit
Release Notes
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.