Skip to content

feat: Enable multi-concurrency scans#125

Open
arekay-nv wants to merge 2 commits intomainfrom
arekay/add_concurrency_sweep
Open

feat: Enable multi-concurrency scans#125
arekay-nv wants to merge 2 commits intomainfrom
arekay/add_concurrency_sweep

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

What does this PR do?

Enables a list of concurrency values to be provided in the config yaml to run multiple benchmark runs on the same server. This is useful for finding optimal configs.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@arekay-nv arekay-nv requested a review from a team as a code owner February 11, 2026 23:08
Copilot AI review requested due to automatic review settings February 11, 2026 23:08
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 11, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@github-actions github-actions bot requested a review from nvzhihanj February 11, 2026 23:08
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @arekay-nv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the benchmarking tool, allowing users to efficiently test performance across various concurrency levels in a single execution. By enabling the --concurrency argument to accept a list of values, the system can now sequentially run multiple benchmarks, each with a different concurrency setting. This facilitates the identification of optimal configurations by providing isolated results for each concurrency level in dedicated report directories, streamlining the performance tuning process.

Highlights

  • Multi-concurrency Benchmarking: Enabled the ability to run benchmarks with multiple concurrency values sequentially, specified via a comma-separated list.
  • CLI Argument Update: The --concurrency command-line argument now accepts a string that can be parsed into a single integer or a list of integers.
  • Config Schema Enhancement: The target_concurrency field in the LoadPattern configuration schema now supports both single integer and list of integers.
  • Benchmark Orchestration: The core benchmark execution logic was refactored to orchestrate multiple benchmark runs when a list of concurrency values is provided, ensuring isolation and dedicated reporting directories for each run.
  • Robust Validation: Enhanced validation for target_concurrency to correctly handle single integer and list of integer inputs, preventing invalid configurations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • src/inference_endpoint/cli.py
    • The --concurrency argument's type was changed from int to str to allow parsing of comma-separated lists.
    • The help text for --concurrency was updated to reflect support for single values or comma-separated lists.
  • src/inference_endpoint/commands/benchmark.py
    • Added logic to parse the concurrency argument, converting it into either a single integer or a list of integers.
    • Refactored the _run_benchmark function into an orchestrator that handles single or multiple concurrency runs.
    • Introduced _run_single_benchmark to encapsulate the logic for a single benchmark execution.
    • Implemented a loop to iterate through a list of concurrency values, calling _run_single_benchmark for each.
    • Ensured each benchmark run creates a dedicated report subdirectory and that event hooks are cleared for isolation.
    • Modified the target_concurrency assignment to use the newly parsed concurrency_value.
  • src/inference_endpoint/config/schema.py
    • Updated the target_concurrency field in LoadPattern to accept int | list[int] | None.
    • Added a field_validator to LoadPattern to explicitly validate that target_concurrency is an integer or a list of integers.
    • Expanded the validate_load_pattern method to correctly validate target_concurrency when it is a list, ensuring all values are positive integers.
  • src/inference_endpoint/load_generator/scheduler.py
    • Modified the ConcurrencyScheduler's __init__ method to validate target_concurrency against both single integer and list of integer types, ensuring all values are positive.
Activity
  • The pull request introduces a new feature.
  • The author has indicated that tests have been added/updated, all tests pass locally, and manual testing has been completed.
  • Pre-commit hooks pass, and code follows project style.
  • No specific review comments or discussions have been recorded yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables specifying multiple concurrency values (via YAML or CLI) to run sequential benchmarks on the same server, saving results to per-concurrency subdirectories.

Changes:

  • Extend target_concurrency to accept int or list[int] in config schema + update validation.
  • Update CLI --concurrency to accept comma-separated lists and plumb parsed values into config.
  • Add benchmark orchestration to run multiple sequential benchmarks with separate report directories.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
src/inference_endpoint/load_generator/scheduler.py Adjusts scheduler validation to accept list-based target_concurrency inputs.
src/inference_endpoint/config/schema.py Updates LoadPattern.target_concurrency type + adds validation for list-based concurrency.
src/inference_endpoint/commands/benchmark.py Adds multi-run orchestration and per-concurrency report directories; parses CLI concurrency list.
src/inference_endpoint/cli.py Changes --concurrency to a string to allow comma-separated lists.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +382 to 389
if (
target_concurrency is None
or (isinstance(target_concurrency, int) and target_concurrency <= 0)
or (isinstance(target_concurrency, list) and len(target_concurrency) == 0)
):
raise ValueError(
f"target_concurrency must be > 0 for CONCURRENCY load pattern, got {target_concurrency}"
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message is now misleading for the list case: an empty list fails the check but the message says “must be > 0”. Update the message to reflect the accepted shapes (positive integer or non-empty list of positive integers), or split the exception messages by failure mode (e.g., None vs empty list vs non-positive int) to make debugging easier.

Copilot uses AI. Check for mistakes.
Comment on lines +283 to +298
@field_validator("target_concurrency", mode="before")
@classmethod
def validate_target_concurrency(cls, v):
"""Validate target_concurrency accepts int or list of ints."""
if v is None:
return v
# Accept single int
if isinstance(v, int):
return v
# Accept list of ints
if isinstance(v, list):
return v
# Try to convert if it's something else (shouldn't happen with proper YAML)
raise ValueError(
f"target_concurrency must be an integer or list of integers, got {type(v)}"
)
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validator’s docstring and comments claim it validates “list of ints” and “tries to convert”, but it currently returns any list unchanged (including list[str], list[float], etc.) and performs no conversion. Either (a) actually validate/convert list elements here (e.g., ensure each element is an int and/or coerce string ints if that’s desired), or (b) remove this validator and rely on the later validate_load_pattern logic (to avoid duplicated/partial validation paths).

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +540
config = BenchmarkConfig(
**{
**config.model_dump(),
"settings": Settings(
**{
**config.settings.model_dump(),
"load_pattern": LoadPattern(
**{
**config.settings.load_pattern.model_dump(),
"target_concurrency": concurrency_value,
}
),
}
),
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reconstructing nested Pydantic models via model_dump() + manual re-wrapping is hard to read and easy to get wrong as the schema evolves. Prefer using Pydantic’s copy/update utilities (e.g., model_copy(update=...) on BenchmarkConfig, Settings, and LoadPattern) to express “same config except target_concurrency=…” more directly and reduce the risk of losing defaults/aliases or introducing subtle serialization differences.

Suggested change
config = BenchmarkConfig(
**{
**config.model_dump(),
"settings": Settings(
**{
**config.settings.model_dump(),
"load_pattern": LoadPattern(
**{
**config.settings.load_pattern.model_dump(),
"target_concurrency": concurrency_value,
}
),
}
),
config = config.model_copy(
update={
"settings": config.settings.model_copy(
update={
"load_pattern": config.settings.load_pattern.model_copy(
update={"target_concurrency": concurrency_value}
)
}
)

Copilot uses AI. Check for mistakes.
Comment on lines +858 to +863
shutil.rmtree(tmp_dir, ignore_errors=True)

# Clear event hooks to ensure clean state for next benchmark run
# This is critical when running multiple concurrency benchmarks
SampleEventHandler.clear_hooks()

Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SampleEventHandler.clear_hooks() is only called in this cleanup path (inside signal_handler). For the new multi-run flow, hooks also need to be cleared after a normal successful run (and likely after non-signal exceptions) to prevent cross-run contamination as described by the comment. Consider moving/duplicating this into the standard per-run cleanup block in _run_single_benchmark that always executes (e.g., a finally), so each sequential run starts from a clean hook state.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for running benchmarks across multiple concurrency values, enhancing performance tuning capabilities. However, a high-severity Path Traversal vulnerability was identified in the handling of the report_dir parameter, which could allow files to be written outside the intended directory. Additionally, a critical TypeError risk exists in ConcurrencyScheduler due to incorrect validation, and there are opportunities to improve the robustness of multi-concurrency runs and simplify configuration update logic using Pydantic's features.

Comment on lines +382 to 389
if (
target_concurrency is None
or (isinstance(target_concurrency, int) and target_concurrency <= 0)
or (isinstance(target_concurrency, list) and len(target_concurrency) == 0)
):
raise ValueError(
f"target_concurrency must be > 0 for CONCURRENCY load pattern, got {target_concurrency}"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The validation for target_concurrency in ConcurrencyScheduler is incorrect. It allows target_concurrency to be a list, but the rest of the scheduler's logic expects it to be an integer. Specifically, the comparison self._inflight >= self._target_concurrency on line 421 will raise a TypeError if self._target_concurrency is a list.

Given the orchestration logic in _run_single_benchmark, this scheduler should only ever receive a single integer for target_concurrency. The validation should enforce this to prevent this runtime error.

        if not isinstance(target_concurrency, int) or target_concurrency <= 0:
            raise ValueError(
                f"target_concurrency must be a positive integer for CONCURRENCY load pattern, got {target_concurrency}"
            )

Comment on lines +396 to +399
if config.report_dir:
base_report_dir = Path(config.report_dir)
else:
base_report_dir = get_default_report_path()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The report_dir parameter, which can be controlled by a user, is used to construct file paths for benchmark reports without proper sanitization. This creates a path traversal vulnerability, allowing an attacker to write files outside of the intended directory. This could lead to overwriting arbitrary files, denial of service, or placing malicious files in sensitive locations.

Suggested change
if config.report_dir:
base_report_dir = Path(config.report_dir)
else:
base_report_dir = get_default_report_path()
if config.report_dir:
user_path = Path(config.report_dir)
# Resolve the user-provided path to an absolute path to prevent traversal attacks
base_report_dir = user_path.resolve()
# Define a safe base directory (e.g., current working directory) and ensure the path is within it
safe_base_dir = Path.cwd().resolve()
if not str(base_report_dir).startswith(str(safe_base_dir)):
raise InputValidationError(f"Invalid report_dir: '{config.report_dir}'. Path must be within '{safe_base_dir}'.")
else:
base_report_dir = get_default_report_path()

Comment on lines +527 to +542
config = BenchmarkConfig(
**{
**config.model_dump(),
"settings": Settings(
**{
**config.settings.model_dump(),
"load_pattern": LoadPattern(
**{
**config.settings.load_pattern.model_dump(),
"target_concurrency": concurrency_value,
}
),
}
),
}
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The method used to update the BenchmarkConfig with a new concurrency_value is quite complex and hard to maintain. It involves deeply nested dictionary unpacking and model reconstruction.

Pydantic's model_copy(update=...) method provides a much cleaner and more robust way to create a modified copy of a frozen model. This would make the code more readable and less prone to errors if the model structure changes.

        config = config.model_copy(
            update={
                "settings": config.settings.model_copy(
                    update={
                        "load_pattern": config.settings.load_pattern.model_copy(
                            update={"target_concurrency": concurrency_value}
                        )
                    }
                )
            }
        )

Comment on lines +277 to +293
if concurrency_str := getattr(args, "concurrency", None):
if "," in concurrency_str:
# Parse comma-separated list
try:
concurrency_value = [int(c.strip()) for c in concurrency_str.split(",")]
except ValueError as e:
raise InputValidationError(
f"Invalid concurrency value '{concurrency_str}': all values must be integers"
) from e
else:
# Parse single integer
try:
concurrency_value = int(concurrency_str)
except ValueError as e:
raise InputValidationError(
f"Invalid concurrency value '{concurrency_str}': must be an integer"
) from e
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for parsing the concurrency string has separate try-except blocks for handling a single value and a comma-separated list, which is a bit repetitive. This can be simplified by always splitting the string by commas and then checking the length of the resulting list. This would reduce code duplication and make the logic more concise.

Suggested change
if concurrency_str := getattr(args, "concurrency", None):
if "," in concurrency_str:
# Parse comma-separated list
try:
concurrency_value = [int(c.strip()) for c in concurrency_str.split(",")]
except ValueError as e:
raise InputValidationError(
f"Invalid concurrency value '{concurrency_str}': all values must be integers"
) from e
else:
# Parse single integer
try:
concurrency_value = int(concurrency_str)
except ValueError as e:
raise InputValidationError(
f"Invalid concurrency value '{concurrency_str}': must be an integer"
) from e
if concurrency_str := getattr(args, "concurrency", None):
try:
concurrencies = [int(c.strip()) for c in concurrency_str.split(",")]
if len(concurrencies) == 1:
concurrency_value = concurrencies[0]
else:
concurrency_value = concurrencies
except ValueError as e:
raise InputValidationError(
f"Invalid concurrency value '{concurrency_str}': all values must be integers"
) from e

Copilot AI review requested due to automatic review settings February 12, 2026 16:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +278 to +280
if isinstance(concurrency_str, int):
concurrency_value = concurrency_str
elif "," in concurrency_str:
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--concurrency is defined as type=str in argparse, so concurrency_str will never be an int here. Dropping the dead isinstance(concurrency_str, int) branch will simplify the logic and avoid misleading future readers.

Suggested change
if isinstance(concurrency_str, int):
concurrency_value = concurrency_str
elif "," in concurrency_str:
if "," in concurrency_str:

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems valid?

Comment on lines +420 to +421
# Create subdirectory for this concurrency value
concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}"
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If target_concurrency contains duplicate values (e.g., [10, 10, 20]), runs will write into the same concurrency_{value} directory and overwrite/mix artifacts (including config.yaml). Consider either validating that the list is unique, or making the directory name unambiguous (e.g., prefix with the index like run_{i+1}_concurrency_{value}).

Suggested change
# Create subdirectory for this concurrency value
concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}"
# Create subdirectory for this concurrency value (include run index to avoid collisions)
concurrency_report_dir = base_report_dir / f"run_{i+1}_concurrency_{concurrency_val}"

Copilot uses AI. Check for mistakes.
Comment on lines +528 to +543
# Create a new config with the specific concurrency value
config = BenchmarkConfig(
**{
**config.model_dump(),
"settings": Settings(
**{
**config.settings.model_dump(),
"load_pattern": LoadPattern(
**{
**config.settings.load_pattern.model_dump(),
"target_concurrency": concurrency_value,
}
),
}
),
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manual rebuild of BenchmarkConfig is quite verbose/error-prone. Since this is a Pydantic v2 model, it’s typically simpler (and less likely to miss fields) to use config.model_copy(update=..., deep=True) to override only settings.load_pattern.target_concurrency.

Suggested change
# Create a new config with the specific concurrency value
config = BenchmarkConfig(
**{
**config.model_dump(),
"settings": Settings(
**{
**config.settings.model_dump(),
"load_pattern": LoadPattern(
**{
**config.settings.load_pattern.model_dump(),
"target_concurrency": concurrency_value,
}
),
}
),
}
# Create a new config with the specific concurrency value by copying and updating
config = config.model_copy(
update={
"settings": config.settings.model_copy(
update={
"load_pattern": config.settings.load_pattern.model_copy(
update={"target_concurrency": concurrency_value},
deep=True,
)
},
deep=True,
)
},
deep=True,

Copilot uses AI. Check for mistakes.
# Accept list of ints
if isinstance(v, list):
return v
# Try to convert if it's something else (shouldn't happen with proper YAML)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the validator will 'Try to convert' other types, but the implementation immediately raises. Either remove/update the comment, or implement the described conversion (e.g., accepting numeric strings) so the code and documentation match.

Suggested change
# Try to convert if it's something else (shouldn't happen with proper YAML)
# Reject any other types (this should not happen with proper YAML)

Copilot uses AI. Check for mistakes.
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 13, 2026 22:41
@arekay-nv arekay-nv force-pushed the arekay/add_concurrency_sweep branch from 805a3b5 to 793491a Compare March 13, 2026 22:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +421 to +422
# Create subdirectory for this concurrency value
concurrency_report_dir = base_report_dir / f"concurrency_{concurrency_val}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants