[TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench by FrankD412 · Pull Request #9250 · NVIDIA/TensorRT-LLM

FrankD412 · 2025-11-18T06:35:40Z

Summary by CodeRabbit

New Features
- Added dataset preparation CLI command with configurable tokenizer, output path, and random seed options.
- New real dataset tool for extracting and preparing HuggingFace datasets with text and multimodal support.
- New synthetic dataset generators: token lengths drawn from normal and uniform distributions.
- Added support for LoRA configuration and randomized task IDs across datasets.
Chores
- Updated .gitignore to exclude macOS metadata files.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-11-18T06:40:19Z

📝 Walkthrough

Walkthrough

This pull request adds dataset preparation infrastructure to the TensorRT LLM benchmarking suite, introducing CLI commands to generate synthetic datasets and prepare real HuggingFace datasets, along with supporting data models and utilities. It also updates the main bench CLI to register the new prepare-dataset command and adds .DS_Store to .gitignore.

Changes

Cohort / File(s)	Summary
Configuration `.gitignore`	Adds `.DS_Store` to ignore macOS Finder metadata files
Dataset Preparation CLI Entry Point `tensorrt_llm/bench/dataset/prepare_dataset.py`	Introduces Click group `prepare_dataset` with CLI options (output, random-seed, task-id, rand-task-id, lora-dir, log-level, trust-remote-code). Defines Pydantic model `RootArgs` for storing parsed arguments including tokenizer validation and loading. Registers subcommands for real, synthetic (normal and uniform distribution) dataset generation
Real Dataset Preparation `tensorrt_llm/bench/dataset/prepare_real_data.py`	Adds `real-dataset` CLI command with `DatasetConfig` model for HuggingFace dataset configuration. Provides utilities to load datasets, handle text-only and multimodal (image) inputs, support configurable output length distributions, and serialize to JSON. Includes validation for prompt key/source specifications
Synthetic Dataset Generation `tensorrt_llm/bench/dataset/prepare_synthetic_data.py`	Implements `token_norm_dist` and `token_unif_dist` CLI commands to generate synthetic token datasets using normal and uniform distributions respectively. Handles task ID generation, LoRA configuration, and structured dataset output with metadata
Dataset Utilities & Models `tensorrt_llm/bench/dataset/utils.py`	Defines `TextSample`, `MultimodalSample`, and `Workload` Pydantic models. Provides serialization (text_dataset_dump, multimodal_dataset_dump), JSON streaming (print_text_dataset, print_multimodal_dataset), distribution sampling (get_norm_dist_lengths, get_unif_dist_lengths, get_exponential_dist_delays), and random token generation utilities with deterministic seeding
CLI Registration `tensorrt_llm/commands/bench.py`	Imports and registers `prepare_dataset` as a subcommand in the trtllm-bench CLI group

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant PrepareDataset as prepare_dataset<br/>(CLI Group)
    participant RootArgs as RootArgs<br/>(Validation)
    participant Subcommands as Subcommands<br/>(real-dataset,<br/>token_norm_dist,<br/>token_unif_dist)
    participant Utils as Dataset Utils<br/>(Serialization &<br/>Sampling)
    participant Output as Output File

    User->>PrepareDataset: Run CLI with options
    PrepareDataset->>RootArgs: Parse CLI args & build model
    activate RootArgs
    RootArgs->>RootArgs: Load & validate tokenizer
    RootArgs-->>PrepareDataset: Return validated args
    deactivate RootArgs
    
    PrepareDataset->>Subcommands: Invoke subcommand<br/>(real-dataset or synthetic)
    activate Subcommands
    alt Real Dataset Path
        Subcommands->>Subcommands: Load HF dataset<br/>& DatasetConfig
        Subcommands->>Utils: Generate samples<br/>(text or multimodal)
    else Synthetic Dataset Path
        Subcommands->>Utils: Sample lengths from<br/>distribution
        Subcommands->>Utils: Generate random tokens
    end
    Subcommands->>Utils: Serialize Workload<br/>(TextSample/MultimodalSample)
    Utils->>Output: Write JSON to file
    Subcommands-->>User: Complete
    deactivate Subcommands

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

prepare_real_data.py: HuggingFace dataset loading and multimodal (image) handling logic with PIL Image to file conversion; validation of dataset configuration options
utils.py: Deterministic randomization logic across multiple distribution functions; JSON serialization via Pydantic models; handling of both text and multimodal sample paths
prepare_synthetic_data.py: Token generation logic ensuring EOS token exclusion; task ID and LoRA configuration derivation
prepare_dataset.py: Tokenizer loading and validation in Pydantic model hook; context object wiring across CLI subcommands
Cross-file integration: Interaction between RootArgs, subcommands, and utility functions; correct threading of metadata, seeds, and configuration through the pipeline

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	⚠️ Warning	PR description is entirely empty. Required sections (title format, description, test coverage) are not filled in, only the template structure remains.	Fill in the PR title following the template (e.g., [TRTLLM-9089][feat] Port prepare_dataset into trtllm-bench), add a clear description of changes, and document test coverage for the new dataset preparation features.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: porting the prepare_dataset functionality into the trtllm-bench command group, with specific reference to the ticket TRTLLM-9089.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (8)

tensorrt_llm/bench/dataset/prepare_dataset.py (1)

34-49: Improve exception chaining for better debugging.

The exception handling would benefit from using raise ... from e to preserve the original traceback context.

Apply this diff:

         except EnvironmentError as e:
-            raise ValueError(
+            raise ValueError(
                 "Cannot find a tokenizer from the given string because of "
                 f"{e}\nPlease set tokenizer to the directory that contains "
                 "the tokenizer, or set to a model name in HuggingFace."
-            )
+            ) from e

tensorrt_llm/bench/dataset/prepare_synthetic_data.py (3)

15-38: Add stacklevel to warning for better debugging.

The warning should include stacklevel=2 to point to the caller's code rather than this helper function.

Apply this diff:
     if use_task_ids and not use_lora:
         warnings.warn(
-            "Task IDs require LoRA directory. Use --lora-dir or omit task IDs.", UserWarning
+            "Task IDs require LoRA directory. Use --lora-dir or omit task IDs.", 
+            UserWarning, 
+            stacklevel=2
         )
50-52: Remove unused variable initializations.

These variables are immediately overwritten and never used in their initial state.

Apply this diff:
-    input_ids = []
-    input_lens = []
-    output_lens = []
-
     input_lens = get_norm_dist_lengths(
110-112: Remove unused variable initializations.

These variables are immediately overwritten and never used in their initial state.

Apply this diff:
-    input_ids = []
-    input_lens = []
-    output_lens = []
-
     input_lens = get_unif_dist_lengths(

tensorrt_llm/bench/dataset/prepare_real_data.py (2)

21-32: Use Click's BadParameter exception for better error reporting.

Using click.BadParameter provides better integration with Click's error handling and gives users clearer feedback about which parameter failed.

Apply this diff:

 def validate_output_len_dist(ctx, param, value):
     """Validate the --output-len-dist option."""
     if value is None:
         return value
     m = re.match(r"(\d+),(\d+)", value)
     if m:
         return int(m.group(1)), int(m.group(2))
     else:
-        raise AssertionError(
+        raise click.BadParameter(
             "Incorrect specification for --output-len-dist. Correct format: "
             "--output-len-dist <output_len_mean>,<output_len_stdev>"
         )

55-61: Use ValueError instead of AssertionError in validator.

Validators should raise ValueError for invalid data rather than AssertionError, which is typically reserved for internal invariants.

Apply this diff:

     @model_validator(mode="after")
     def check_prompt(self) -> "DatasetConfig":
         if self.prompt_key and self.prompt:
-            raise AssertionError("--prompt-key and --prompt cannot be set at the same time.")
+            raise ValueError("--prompt-key and --prompt cannot be set at the same time.")
         if (not self.prompt_key) and (not self.prompt):
-            raise AssertionError("Either --prompt-key or --prompt must be set.")
+            raise ValueError("Either --prompt-key or --prompt must be set.")
         return self

tensorrt_llm/bench/dataset/utils.py (2)

29-40: Consider using Pydantic's standard patterns.

The custom __init__ could be replaced with a @model_validator(mode='after') or model_post_init hook for better alignment with Pydantic conventions.

Example using @model_validator:

 class Workload(BaseModel):
     metadata: dict
     samples: List[Union[TextSample, MultimodalSample]] = []
 
-    def __init__(self, **kwargs) -> None:
-        super().__init__(**kwargs)
-        self.setup_workload_name()
-
-    def setup_workload_name(self):
+    @model_validator(mode='after')
+    def setup_workload_name(self):
         # Keys to ignore
         ignore_keys = ["tokenizer"]
         # Create a string by concatenating keys and values with "__"
         workload_name = "__".join(
             f"{key}:{value}" for key, value in self.metadata.items() if key not in ignore_keys
         )
         self.metadata.setdefault("workload_name", workload_name)
+        return self

96-104: Add strict parameter to zip for safety.

Without strict=True, mismatched list lengths would be silently truncated, potentially hiding bugs.

Apply this diff:

 def print_multimodal_dataset(multimodal_texts, multimodal_image_paths, output_lens):
-    for i, (text, image_paths) in enumerate(zip(multimodal_texts, multimodal_image_paths)):
+    for i, (text, image_paths) in enumerate(zip(multimodal_texts, multimodal_image_paths, strict=True)):
         d = {
             "task_id": i,
             "prompt": text,
             "media_paths": image_paths,
             "output_tokens": output_lens[i],
         }
         yield json.dumps(d, separators=(",", ":"), ensure_ascii=False)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96cfdd8 and 5910dfe.

📒 Files selected for processing (6)

.gitignore (1 hunks)
tensorrt_llm/bench/dataset/prepare_dataset.py (1 hunks)
tensorrt_llm/bench/dataset/prepare_real_data.py (1 hunks)
tensorrt_llm/bench/dataset/prepare_synthetic_data.py (1 hunks)
tensorrt_llm/bench/dataset/utils.py (1 hunks)
tensorrt_llm/commands/bench.py (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-18T08:42:02.640Z

Learnt from: samuellees
Repo: NVIDIA/TensorRT-LLM PR: 6974
File: tensorrt_llm/serve/scripts/benchmark_dataset.py:558-566
Timestamp: 2025-08-18T08:42:02.640Z
Learning: In TensorRT-LLM's RandomDataset (tensorrt_llm/serve/scripts/benchmark_dataset.py), when using --random-token-ids option, sequence length accuracy is prioritized over semantic correctness for benchmarking purposes. The encode/decode operations should use skip_special_tokens=True and add_special_tokens=False to ensure exact target token lengths.

Applied to files:

tensorrt_llm/bench/dataset/prepare_dataset.py
tensorrt_llm/bench/dataset/utils.py
tensorrt_llm/bench/dataset/prepare_synthetic_data.py

🧬 Code graph analysis (4)

tensorrt_llm/commands/bench.py (1)

tensorrt_llm/bench/dataset/prepare_dataset.py (1)

prepare_dataset (73-84)

tensorrt_llm/bench/dataset/prepare_dataset.py (4)

tensorrt_llm/bench/dataset/prepare_real_data.py (1)

real_dataset (209-314)

tensorrt_llm/bench/dataset/prepare_synthetic_data.py (2)

token_norm_dist (48-90)

token_unif_dist (108-150)

tensorrt_llm/builder.py (1)

default (45-50)

tensorrt_llm/bench/benchmark/__init__.py (1)

checkpoint_path (75-76)

tensorrt_llm/bench/dataset/prepare_real_data.py (1)

tensorrt_llm/bench/dataset/utils.py (3)

get_norm_dist_lengths (122-126)

print_multimodal_dataset (96-104)

print_text_dataset (79-93)

tensorrt_llm/bench/dataset/prepare_synthetic_data.py (1)

tensorrt_llm/bench/dataset/utils.py (5)

gen_random_tokens (136-157)

get_norm_dist_lengths (122-126)

get_unif_dist_lengths (129-133)

print_text_dataset (79-93)

text_dataset_dump (43-57)

🪛 Ruff (0.14.5)

tensorrt_llm/bench/dataset/prepare_dataset.py

41-45: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

41-45: Avoid specifying long messages outside the exception class

(TRY003)

tensorrt_llm/bench/dataset/utils.py

97-97: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

142-142: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

tensorrt_llm/bench/dataset/prepare_real_data.py

21-21: Unused function argument: ctx

(ARG001)

21-21: Unused function argument: param

(ARG001)

29-32: Avoid specifying long messages outside the exception class

(TRY003)

58-58: Avoid specifying long messages outside the exception class

(TRY003)

60-60: Avoid specifying long messages outside the exception class

(TRY003)

109-115: Avoid specifying long messages outside the exception class

(TRY003)

150-150: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

247-247: Avoid specifying long messages outside the exception class

(TRY003)

268-268: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

tensorrt_llm/bench/dataset/prepare_synthetic_data.py

21-21: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

30-30: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (6)

.gitignore (1)

88-90: ✓ Sensible addition to ignore macOS system files.

Adding .DS_Store to .gitignore is a standard best practice that prevents macOS Finder metadata from being committed. The comment clearly delineates the macOS-specific section, and the placement is logical.

tensorrt_llm/commands/bench.py (1)

9-9: LGTM! Clean integration of the new dataset preparation command.

The import and command registration follow the existing pattern consistently.

Also applies to: 61-61

tensorrt_llm/bench/dataset/prepare_dataset.py (2)

25-32: LGTM! Well-structured data model.

The RootArgs model cleanly captures all necessary configuration parameters for dataset preparation.

87-89: LGTM! Command registration follows Click conventions.

tensorrt_llm/bench/dataset/utils.py (2)

43-76: LGTM! Clean serialization functions.

Both dump functions properly handle directory creation and JSON serialization.

136-157: LGTM! Robust random token generation.

The function correctly handles EOS token exclusion and uses deterministic seeding for reproducibility. Based on learnings, this aligns with TensorRT-LLM's approach of prioritizing sequence length accuracy over semantic correctness for benchmarking.

tensorrt_llm/bench/dataset/prepare_dataset.py

tensorrt_llm/bench/dataset/prepare_real_data.py

tensorrt_llm/bench/dataset/prepare_synthetic_data.py

tensorrt_llm/bench/dataset/utils.py

FrankD412 · 2025-11-20T02:16:44Z

/bot run

tensorrt-cicd · 2025-11-20T02:22:04Z

PR_Github #25125 [ run ] triggered by Bot. Commit: e3303e4

tensorrt-cicd · 2025-11-20T04:03:26Z

PR_Github #25125 [ run ] completed with state SUCCESS. Commit: e3303e4
/LLM/main/L0_MergeRequest_PR pipeline #18993 completed with status: 'FAILURE'

FrankD412 · 2025-11-20T19:39:38Z

/bot run

tensorrt-cicd · 2025-11-20T19:44:55Z

PR_Github #25237 [ run ] triggered by Bot. Commit: 61ec730

tensorrt-cicd · 2025-11-20T20:45:48Z

PR_Github #25237 [ run ] completed with state SUCCESS. Commit: 61ec730
/LLM/main/L0_MergeRequest_PR pipeline #19088 completed with status: 'FAILURE'

FrankD412 · 2025-11-20T20:53:04Z

/bot run

tensorrt-cicd · 2025-11-20T20:58:55Z

PR_Github #25244 [ run ] triggered by Bot. Commit: c0a39cc

tensorrt-cicd · 2025-11-20T22:11:49Z

PR_Github #25244 [ run ] completed with state SUCCESS. Commit: c0a39cc
/LLM/main/L0_MergeRequest_PR pipeline #19093 completed with status: 'FAILURE'

FrankD412 · 2025-11-21T00:35:50Z

/bot run

tensorrt-cicd · 2025-11-21T00:41:40Z

PR_Github #25260 [ run ] triggered by Bot. Commit: 2523e3d

tensorrt-cicd · 2025-11-21T02:03:02Z

PR_Github #25260 [ run ] completed with state SUCCESS. Commit: 2523e3d
/LLM/main/L0_MergeRequest_PR pipeline #19107 completed with status: 'FAILURE'

tensorrt-cicd · 2025-12-03T03:29:12Z

PR_Github #26700 [ run ] completed with state ABORTED. Commit: f8e2e3a
LLM/main/L0_MergeRequest_PR #20319 (Blue Ocean) completed with status: ABORTED

FrankD412 · 2025-12-03T03:30:01Z

/bot run

tensorrt-cicd · 2025-12-03T03:36:04Z

PR_Github #26706 [ run ] triggered by Bot. Commit: 07cbb2d

tensorrt-cicd · 2025-12-03T03:36:07Z

PR_Github #26703 [ run ] completed with state ABORTED. Commit: 07cbb2d

tensorrt-cicd · 2025-12-03T07:02:10Z

PR_Github #26706 [ run ] completed with state SUCCESS. Commit: 07cbb2d
/LLM/main/L0_MergeRequest_PR pipeline #20323 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

FrankD412 · 2025-12-03T08:15:05Z

/bot run

tensorrt-cicd · 2025-12-03T08:23:14Z

PR_Github #26750 [ run ] triggered by Bot. Commit: 15bd131

tensorrt-cicd · 2025-12-03T12:01:30Z

PR_Github #26750 [ run ] completed with state SUCCESS. Commit: 15bd131
/LLM/main/L0_MergeRequest_PR pipeline #20363 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

FrankD412 · 2025-12-06T00:56:44Z

/bot run

tensorrt-cicd · 2025-12-06T01:02:09Z

PR_Github #27171 [ run ] triggered by Bot. Commit: 44ae328

tensorrt-cicd · 2025-12-06T05:17:55Z

PR_Github #27171 [ run ] completed with state SUCCESS. Commit: 44ae328
/LLM/main/L0_MergeRequest_PR pipeline #20733 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Add MacOSX DS_Store to gitignore. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Update imports. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Update click group. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Updates to CLI. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Rename. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Add name. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Renamed real dataset command. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Change to group. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Add docstring. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Remove pass_obj. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Fix context subscription. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Updates to output. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Updates to remove stdout. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Add deprecation flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Code clean up. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Fix generator call. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Update prepare_dataset in docs. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Update examples. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Update testing for trtllm-bench dataset. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Remove trtllm-bench dataset from run_ex. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Add missed __init__.py Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Re-add check for dataset subcommand. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Fix execution of trtllm-bench dataset. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

FrankD412 · 2025-12-08T17:24:44Z

/bot reuse-pipeline 20733

tensorrt-cicd · 2025-12-08T17:31:09Z

PR_Github #27338 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: 20733

FrankD412 · 2025-12-08T17:33:23Z

/bot skip --comment "Passed previously, rebased to fix conflict in docs."

tensorrt-cicd · 2025-12-08T17:39:00Z

PR_Github #27341 [ skip ] triggered by Bot. Commit: e8b221f

tensorrt-cicd · 2025-12-08T18:19:30Z

PR_Github #27341 [ skip ] completed with state SUCCESS. Commit: e8b221f
Skipping testing for commit e8b221f

FrankD412 self-assigned this Nov 18, 2025

FrankD412 requested a review from a team as a code owner November 18, 2025 06:35

FrankD412 requested a review from nv-yilinf November 18, 2025 06:35

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

FrankD412 changed the title ~~Draft: [TRTLLM-9089] Port prepare_dataset into trtllm-bench~~ Draft: [TRTLLM-9089] Chore: Port prepare_dataset into trtllm-bench Nov 18, 2025

FrankD412 changed the title ~~Draft: [TRTLLM-9089] Chore: Port prepare_dataset into trtllm-bench~~ Draft: [TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench Nov 18, 2025

FrankD412 requested a review from a team as a code owner November 19, 2025 22:13

FrankD412 requested review from QiJune and chzblych November 19, 2025 22:13

FrankD412 requested review from a team as code owners November 19, 2025 23:55

FrankD412 requested a review from Wanli-Jiang November 19, 2025 23:55

FrankD412 changed the title ~~Draft: [TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench~~ [TRTLLM-9089][chore] Port prepare_dataset into trtllm-bench Nov 20, 2025

FrankD412 force-pushed the trtllm-bench/prep_ds_entrypoint branch from 09f8d9e to c0368cb Compare November 20, 2025 01:39

FrankD412 force-pushed the trtllm-bench/prep_ds_entrypoint branch from e3303e4 to 61ec730 Compare November 20, 2025 19:34

FrankD412 force-pushed the trtllm-bench/prep_ds_entrypoint branch from 9fac95c to 2523e3d Compare November 21, 2025 00:35

FrankD412 force-pushed the trtllm-bench/prep_ds_entrypoint branch from 05962eb to 5e6a1ad Compare November 21, 2025 06:47

chzblych approved these changes Dec 5, 2025

View reviewed changes

FrankD412 added 12 commits December 8, 2025 08:54

Rename dataset to prepare-dataset.

e0d049d

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Update missed updates.

c54798b

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Update missed updates.

ada16bb

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

More fixes.

cd073bc

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Fix prepare dataset check.

4977d8c

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

More renaming.

c7792cc

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Fix weird mismatches between main.

38e8da3

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Fix another mismatch.

f9f9ecf

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

More fixes.

e6d2d70

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

More fixes.

7e07a4e

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

Remove duplicated code block?

e8b221f

Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com>

lucaslie approved these changes Dec 8, 2025

View reviewed changes

Conversation

FrankD412 commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FrankD412 commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

FrankD412 commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

FrankD412 commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

tensorrt-cicd commented Nov 20, 2025

Uh oh!

FrankD412 commented Nov 21, 2025

Uh oh!

tensorrt-cicd commented Nov 21, 2025

Uh oh!

tensorrt-cicd commented Nov 21, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

FrankD412 commented Dec 3, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

FrankD412 commented Dec 3, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

tensorrt-cicd commented Dec 3, 2025

Uh oh!

FrankD412 commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

tensorrt-cicd commented Dec 6, 2025

Uh oh!

FrankD412 commented Dec 8, 2025

Uh oh!

tensorrt-cicd commented Dec 8, 2025

Uh oh!

FrankD412 commented Dec 8, 2025

FrankD412 commented Nov 18, 2025 •

edited

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading