Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @arekay-nv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the benchmarking tool, allowing users to efficiently test performance across various concurrency levels in a single execution. By enabling the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Enables specifying multiple concurrency values (via YAML or CLI) to run sequential benchmarks on the same server, saving results to per-concurrency subdirectories.
Changes:
- Extend
target_concurrencyto acceptintorlist[int]in config schema + update validation. - Update CLI
--concurrencyto accept comma-separated lists and plumb parsed values into config. - Add benchmark orchestration to run multiple sequential benchmarks with separate report directories.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/inference_endpoint/load_generator/scheduler.py | Adjusts scheduler validation to accept list-based target_concurrency inputs. |
| src/inference_endpoint/config/schema.py | Updates LoadPattern.target_concurrency type + adds validation for list-based concurrency. |
| src/inference_endpoint/commands/benchmark.py | Adds multi-run orchestration and per-concurrency report directories; parses CLI concurrency list. |
| src/inference_endpoint/cli.py | Changes --concurrency to a string to allow comma-separated lists. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ( | ||
| target_concurrency is None | ||
| or (isinstance(target_concurrency, int) and target_concurrency <= 0) | ||
| or (isinstance(target_concurrency, list) and len(target_concurrency) == 0) | ||
| ): | ||
| raise ValueError( | ||
| f"target_concurrency must be > 0 for CONCURRENCY load pattern, got {target_concurrency}" | ||
| ) |
There was a problem hiding this comment.
The error message is now misleading for the list case: an empty list fails the check but the message says “must be > 0”. Update the message to reflect the accepted shapes (positive integer or non-empty list of positive integers), or split the exception messages by failure mode (e.g., None vs empty list vs non-positive int) to make debugging easier.
| @field_validator("target_concurrency", mode="before") | ||
| @classmethod | ||
| def validate_target_concurrency(cls, v): | ||
| """Validate target_concurrency accepts int or list of ints.""" | ||
| if v is None: | ||
| return v | ||
| # Accept single int | ||
| if isinstance(v, int): | ||
| return v | ||
| # Accept list of ints | ||
| if isinstance(v, list): | ||
| return v | ||
| # Try to convert if it's something else (shouldn't happen with proper YAML) | ||
| raise ValueError( | ||
| f"target_concurrency must be an integer or list of integers, got {type(v)}" | ||
| ) |
There was a problem hiding this comment.
This validator’s docstring and comments claim it validates “list of ints” and “tries to convert”, but it currently returns any list unchanged (including list[str], list[float], etc.) and performs no conversion. Either (a) actually validate/convert list elements here (e.g., ensure each element is an int and/or coerce string ints if that’s desired), or (b) remove this validator and rely on the later validate_load_pattern logic (to avoid duplicated/partial validation paths).
| config = BenchmarkConfig( | ||
| **{ | ||
| **config.model_dump(), | ||
| "settings": Settings( | ||
| **{ | ||
| **config.settings.model_dump(), | ||
| "load_pattern": LoadPattern( | ||
| **{ | ||
| **config.settings.load_pattern.model_dump(), | ||
| "target_concurrency": concurrency_value, | ||
| } | ||
| ), | ||
| } | ||
| ), |
There was a problem hiding this comment.
Reconstructing nested Pydantic models via model_dump() + manual re-wrapping is hard to read and easy to get wrong as the schema evolves. Prefer using Pydantic’s copy/update utilities (e.g., model_copy(update=...) on BenchmarkConfig, Settings, and LoadPattern) to express “same config except target_concurrency=…” more directly and reduce the risk of losing defaults/aliases or introducing subtle serialization differences.
| config = BenchmarkConfig( | |
| **{ | |
| **config.model_dump(), | |
| "settings": Settings( | |
| **{ | |
| **config.settings.model_dump(), | |
| "load_pattern": LoadPattern( | |
| **{ | |
| **config.settings.load_pattern.model_dump(), | |
| "target_concurrency": concurrency_value, | |
| } | |
| ), | |
| } | |
| ), | |
| config = config.model_copy( | |
| update={ | |
| "settings": config.settings.model_copy( | |
| update={ | |
| "load_pattern": config.settings.load_pattern.model_copy( | |
| update={"target_concurrency": concurrency_value} | |
| ) | |
| } | |
| ) |
| shutil.rmtree(tmp_dir, ignore_errors=True) | ||
|
|
||
| # Clear event hooks to ensure clean state for next benchmark run | ||
| # This is critical when running multiple concurrency benchmarks | ||
| SampleEventHandler.clear_hooks() | ||
|
|
There was a problem hiding this comment.
SampleEventHandler.clear_hooks() is only called in this cleanup path (inside signal_handler). For the new multi-run flow, hooks also need to be cleared after a normal successful run (and likely after non-signal exceptions) to prevent cross-run contamination as described by the comment. Consider moving/duplicating this into the standard per-run cleanup block in _run_single_benchmark that always executes (e.g., a finally), so each sequential run starts from a clean hook state.
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature for running benchmarks across multiple concurrency values, enhancing performance tuning capabilities. However, a high-severity Path Traversal vulnerability was identified in the handling of the report_dir parameter, which could allow files to be written outside the intended directory. Additionally, a critical TypeError risk exists in ConcurrencyScheduler due to incorrect validation, and there are opportunities to improve the robustness of multi-concurrency runs and simplify configuration update logic using Pydantic's features.
| if ( | ||
| target_concurrency is None | ||
| or (isinstance(target_concurrency, int) and target_concurrency <= 0) | ||
| or (isinstance(target_concurrency, list) and len(target_concurrency) == 0) | ||
| ): | ||
| raise ValueError( | ||
| f"target_concurrency must be > 0 for CONCURRENCY load pattern, got {target_concurrency}" | ||
| ) |
There was a problem hiding this comment.
The validation for target_concurrency in ConcurrencyScheduler is incorrect. It allows target_concurrency to be a list, but the rest of the scheduler's logic expects it to be an integer. Specifically, the comparison self._inflight >= self._target_concurrency on line 421 will raise a TypeError if self._target_concurrency is a list.
Given the orchestration logic in _run_single_benchmark, this scheduler should only ever receive a single integer for target_concurrency. The validation should enforce this to prevent this runtime error.
if not isinstance(target_concurrency, int) or target_concurrency <= 0:
raise ValueError(
f"target_concurrency must be a positive integer for CONCURRENCY load pattern, got {target_concurrency}"
)| if config.report_dir: | ||
| base_report_dir = Path(config.report_dir) | ||
| else: | ||
| base_report_dir = get_default_report_path() |
There was a problem hiding this comment.
The report_dir parameter, which can be controlled by a user, is used to construct file paths for benchmark reports without proper sanitization. This creates a path traversal vulnerability, allowing an attacker to write files outside of the intended directory. This could lead to overwriting arbitrary files, denial of service, or placing malicious files in sensitive locations.
| if config.report_dir: | |
| base_report_dir = Path(config.report_dir) | |
| else: | |
| base_report_dir = get_default_report_path() | |
| if config.report_dir: | |
| user_path = Path(config.report_dir) | |
| # Resolve the user-provided path to an absolute path to prevent traversal attacks | |
| base_report_dir = user_path.resolve() | |
| # Define a safe base directory (e.g., current working directory) and ensure the path is within it | |
| safe_base_dir = Path.cwd().resolve() | |
| if not str(base_report_dir).startswith(str(safe_base_dir)): | |
| raise InputValidationError(f"Invalid report_dir: '{config.report_dir}'. Path must be within '{safe_base_dir}'.") | |
| else: | |
| base_report_dir = get_default_report_path() |
| config = BenchmarkConfig( | ||
| **{ | ||
| **config.model_dump(), | ||
| "settings": Settings( | ||
| **{ | ||
| **config.settings.model_dump(), | ||
| "load_pattern": LoadPattern( | ||
| **{ | ||
| **config.settings.load_pattern.model_dump(), | ||
| "target_concurrency": concurrency_value, | ||
| } | ||
| ), | ||
| } | ||
| ), | ||
| } | ||
| ) |
There was a problem hiding this comment.
The method used to update the BenchmarkConfig with a new concurrency_value is quite complex and hard to maintain. It involves deeply nested dictionary unpacking and model reconstruction.
Pydantic's model_copy(update=...) method provides a much cleaner and more robust way to create a modified copy of a frozen model. This would make the code more readable and less prone to errors if the model structure changes.
config = config.model_copy(
update={
"settings": config.settings.model_copy(
update={
"load_pattern": config.settings.load_pattern.model_copy(
update={"target_concurrency": concurrency_value}
)
}
)
}
)| if concurrency_str := getattr(args, "concurrency", None): | ||
| if "," in concurrency_str: | ||
| # Parse comma-separated list | ||
| try: | ||
| concurrency_value = [int(c.strip()) for c in concurrency_str.split(",")] | ||
| except ValueError as e: | ||
| raise InputValidationError( | ||
| f"Invalid concurrency value '{concurrency_str}': all values must be integers" | ||
| ) from e | ||
| else: | ||
| # Parse single integer | ||
| try: | ||
| concurrency_value = int(concurrency_str) | ||
| except ValueError as e: | ||
| raise InputValidationError( | ||
| f"Invalid concurrency value '{concurrency_str}': must be an integer" | ||
| ) from e |
There was a problem hiding this comment.
The logic for parsing the concurrency string has separate try-except blocks for handling a single value and a comma-separated list, which is a bit repetitive. This can be simplified by always splitting the string by commas and then checking the length of the resulting list. This would reduce code duplication and make the logic more concise.
| if concurrency_str := getattr(args, "concurrency", None): | |
| if "," in concurrency_str: | |
| # Parse comma-separated list | |
| try: | |
| concurrency_value = [int(c.strip()) for c in concurrency_str.split(",")] | |
| except ValueError as e: | |
| raise InputValidationError( | |
| f"Invalid concurrency value '{concurrency_str}': all values must be integers" | |
| ) from e | |
| else: | |
| # Parse single integer | |
| try: | |
| concurrency_value = int(concurrency_str) | |
| except ValueError as e: | |
| raise InputValidationError( | |
| f"Invalid concurrency value '{concurrency_str}': must be an integer" | |
| ) from e | |
| if concurrency_str := getattr(args, "concurrency", None): | |
| try: | |
| concurrencies = [int(c.strip()) for c in concurrency_str.split(",")] | |
| if len(concurrencies) == 1: | |
| concurrency_value = concurrencies[0] | |
| else: | |
| concurrency_value = concurrencies | |
| except ValueError as e: | |
| raise InputValidationError( | |
| f"Invalid concurrency value '{concurrency_str}': all values must be integers" | |
| ) from e |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if isinstance(concurrency_str, int): | ||
| concurrency_value = concurrency_str | ||
| elif "," in concurrency_str: |
There was a problem hiding this comment.
--concurrency is defined as type=str in argparse, so concurrency_str will never be an int here. Dropping the dead isinstance(concurrency_str, int) branch will simplify the logic and avoid misleading future readers.
| if isinstance(concurrency_str, int): | |
| concurrency_value = concurrency_str | |
| elif "," in concurrency_str: | |
| if "," in concurrency_str: |
| # Create subdirectory for this concurrency value | ||
| concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}" |
There was a problem hiding this comment.
If target_concurrency contains duplicate values (e.g., [10, 10, 20]), runs will write into the same concurrency_{value} directory and overwrite/mix artifacts (including config.yaml). Consider either validating that the list is unique, or making the directory name unambiguous (e.g., prefix with the index like run_{i+1}_concurrency_{value}).
| # Create subdirectory for this concurrency value | |
| concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}" | |
| # Create subdirectory for this concurrency value (include run index to avoid collisions) | |
| concurrency_report_dir = base_report_dir / f"run_{i+1}_concurrency_{concurrency_val}" |
| # Create a new config with the specific concurrency value | ||
| config = BenchmarkConfig( | ||
| **{ | ||
| **config.model_dump(), | ||
| "settings": Settings( | ||
| **{ | ||
| **config.settings.model_dump(), | ||
| "load_pattern": LoadPattern( | ||
| **{ | ||
| **config.settings.load_pattern.model_dump(), | ||
| "target_concurrency": concurrency_value, | ||
| } | ||
| ), | ||
| } | ||
| ), | ||
| } |
There was a problem hiding this comment.
This manual rebuild of BenchmarkConfig is quite verbose/error-prone. Since this is a Pydantic v2 model, it’s typically simpler (and less likely to miss fields) to use config.model_copy(update=..., deep=True) to override only settings.load_pattern.target_concurrency.
| # Create a new config with the specific concurrency value | |
| config = BenchmarkConfig( | |
| **{ | |
| **config.model_dump(), | |
| "settings": Settings( | |
| **{ | |
| **config.settings.model_dump(), | |
| "load_pattern": LoadPattern( | |
| **{ | |
| **config.settings.load_pattern.model_dump(), | |
| "target_concurrency": concurrency_value, | |
| } | |
| ), | |
| } | |
| ), | |
| } | |
| # Create a new config with the specific concurrency value by copying and updating | |
| config = config.model_copy( | |
| update={ | |
| "settings": config.settings.model_copy( | |
| update={ | |
| "load_pattern": config.settings.load_pattern.model_copy( | |
| update={"target_concurrency": concurrency_value}, | |
| deep=True, | |
| ) | |
| }, | |
| deep=True, | |
| ) | |
| }, | |
| deep=True, |
| # Accept list of ints | ||
| if isinstance(v, list): | ||
| return v | ||
| # Try to convert if it's something else (shouldn't happen with proper YAML) |
There was a problem hiding this comment.
The comment says the validator will 'Try to convert' other types, but the implementation immediately raises. Either remove/update the comment, or implement the described conversion (e.g., accepting numeric strings) so the code and documentation match.
| # Try to convert if it's something else (shouldn't happen with proper YAML) | |
| # Reject any other types (this should not happen with proper YAML) |
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
805a3b5 to
793491a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| # Create subdirectory for this concurrency value | ||
| concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}" |
What does this PR do?
Enables a list of concurrency values to be provided in the config yaml to run multiple benchmark runs on the same server. This is useful for finding optimal configs.
Type of change
Related issues
Testing
Checklist