[feat] WIP - Sweep qwen scripts and example by arekay-nv · Pull Request #181 · mlcommons/endpoints

arekay-nv · 2026-03-17T21:13:03Z

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2026-03-17T21:13:13Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2026-03-17T21:13:29Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, self-contained example for benchmarking the Qwen2.5-0.5B-Instruct large language model. It provides a complete workflow for evaluating the model's performance using either vLLM or SGLang inference servers, covering both maximum throughput (offline) and concurrency profiling (online) scenarios. The addition aims to simplify the process for users to assess the model's capabilities and identify optimal deployment configurations.

Highlights

New Qwen2.5-0.5B-Instruct Benchmark Example: Introduced a comprehensive example for benchmarking the Qwen2.5-0.5B-Instruct model, including all necessary scripts and configurations.
Dual Inference Server Support: The example supports benchmarking with both vLLM and SGLang inference servers, offering flexibility for performance evaluation.
Automated Benchmarking Workflow: A single run_benchmark.sh script automates the entire process, from dataset preparation and server launch to benchmark execution and result saving, for both offline and online modes.
Concurrency Sweep Functionality: Integrated scripts for performing online concurrency sweeps, allowing users to profile model performance across various concurrent request loads.
Detailed Documentation and Troubleshooting: Provided extensive QUICKSTART.md and README.md files with step-by-step instructions, expected performance metrics, and common troubleshooting tips.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/08_Qwen2.5-0.5B_Example/.gitignore
- Added common ignore patterns for benchmark results, generated data, logs, and Python cache.
examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md
- Added a quick start guide for benchmarking Qwen2.5-0.5B-Instruct with vLLM or SGLang, including one-command and manual steps, expected performance, and troubleshooting.
examples/08_Qwen2.5-0.5B_Example/README.md
- Added detailed documentation for the Qwen2.5-0.5B-Instruct benchmark example, covering model info, prerequisites, setup, server launch, benchmarking, results, configurations, troubleshooting, and performance expectations.
examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml
- Added a YAML configuration for offline benchmarking of Qwen2.5-0.5B-Instruct using vLLM.
examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml
- Added a YAML configuration for online concurrency sweep benchmarking of Qwen2.5-0.5B-Instruct using vLLM.
examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py
- Added a Python script to prepare the test dataset by ensuring a 'prompt' column exists, necessary for the benchmarking tool.
examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh
- Added a comprehensive shell script to automate the entire benchmarking workflow for Qwen2.5-0.5B-Instruct, supporting both vLLM and SGLang, and offline/online modes.
examples/08_Qwen2.5-0.5B_Example/sglang_offline_qwen_benchmark.yaml
- Added a YAML configuration for offline benchmarking of Qwen2.5-0.5B-Instruct using SGLang.
examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml
- Added a YAML configuration for online concurrency sweep benchmarking of Qwen2.5-0.5B-Instruct using SGLang.
scripts/concurrency_sweep/run.py
- Added a Python script to execute a concurrency sweep benchmark, dynamically generating configurations and running 'inference-endpoint'.
scripts/concurrency_sweep/summarize.py
- Added a Python script to summarize and visualize results from a concurrency sweep, generating tables, CSV, Markdown, and plots.

Activity

The pull request is currently marked as 'WIP' (Work In Progress) by the author, arekay-nv.
No specific human activity, comments, or reviews have been recorded on this pull request yet, indicating it is in its initial stages of development.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new comprehensive example for benchmarking the Qwen2.5-0.5B model, including scripts for running benchmarks, documentation, and configuration files. The changes are well-structured and add significant value. My review includes suggestions to improve the clarity of the documentation, enhance the robustness of the shell script, and make the Python summary script more resilient to missing optional dependencies.

examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md

examples/08_Qwen2.5-0.5B_Example/README.md

gemini-code-assist · 2026-03-17T21:15:48Z

examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh

+if [[ "$SERVER_TYPE" == "vllm" ]]; then
+    READY_PATTERN="Uvicorn running\|Application startup complete"
+else
+    READY_PATTERN="Uvicorn running\|Server is ready"
+fi


When using grep -E (as suggested for line 138), the alternation operator is |, not \|. The backslash should be removed for the pattern to be correctly interpreted by extended regex.

Suggested change

if [[ "$SERVER_TYPE" == "vllm" ]]; then

READY_PATTERN="Uvicorn running\|Application startup complete"

else

READY_PATTERN="Uvicorn running\|Server is ready"

fi

if [[ "$SERVER_TYPE" == "vllm" ]]; then

READY_PATTERN="Uvicorn running|Application startup complete"

else

READY_PATTERN="Uvicorn running|Server is ready"

fi

gemini-code-assist · 2026-03-17T21:15:48Z

examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh

+fi
+
+while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+    if docker logs ${CONTAINER_NAME} 2>&1 | grep -q "$READY_PATTERN"; then


The grep -q command uses Basic Regular Expressions (BRE) by default, where | (from the READY_PATTERN) is not a special character for alternation. To correctly match either part of the pattern, you should use grep -qE for Extended Regular Expressions, which is also used on line 142.

Suggested change

if docker logs ${CONTAINER_NAME} 2>&1 | grep -q "$READY_PATTERN"; then

if docker logs ${CONTAINER_NAME} 2>&1 | grep -qE "$READY_PATTERN"; then

gemini-code-assist · 2026-03-17T21:15:48Z

examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh

+
+if curl -s http://localhost:${SERVER_PORT}/v1/models 2>/dev/null | grep -q "${MODEL_NAME}"; then
+    echo "✅ Server is responding correctly"
+elif curl -s http://localhost:${SERVER_PORT}/health 2>/dev/null | grep -q "ok\|healthy"; then


For portability and correct regex evaluation, use grep -qE with | for alternation instead of grep -q with \|.

Suggested change

elif curl -s http://localhost:${SERVER_PORT}/health 2>/dev/null | grep -q "ok\|healthy"; then

elif curl -s http://localhost:${SERVER_PORT}/health 2>/dev/null | grep -qE "ok|healthy"; then

gemini-code-assist · 2026-03-17T21:15:48Z

scripts/concurrency_sweep/summarize.py

+def write_plots(rows: list[dict], path: Path) -> None:
+    import matplotlib.pyplot as plt
+
+    successful = [r for r in rows if r["status"] == "ok"]
+    if not successful:
+        print("No successful runs to plot.", file=sys.stderr)
+        return
+
+    x = [r["concurrency"] for r in successful]
+    tps = [r["tps"] for r in successful]
+    ttft_p99 = [r["ttft_p99_ms"] for r in successful]
+    tpot_p50 = [r["tpot_p50_ms"] for r in successful]
+
+    fig, axes = plt.subplots(3, 1, figsize=(9, 11), sharex=True)
+    fig.suptitle("Concurrency Sweep Performance", fontsize=14, fontweight="bold")
+
+    metrics = [
+        (axes[0], tps, "TPS", "Tokens / s", "tab:blue"),
+        (axes[1], ttft_p99, "TTFT P99 (ms)", "Latency (ms)", "tab:orange"),
+        (axes[2], tpot_p50, "TPOT P50 (ms)", "Latency (ms)", "tab:green"),
+    ]
+    for ax, values, label, ylabel, color in metrics:
+        ax.plot(x, values, marker="o", linewidth=2, color=color, label=label)
+        ax.set_ylabel(ylabel)
+        ax.set_xscale("log", base=2)
+        ax.set_xticks(x)
+        ax.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda v, _: str(int(v))))
+        ax.legend(loc="upper left")
+        ax.grid(True, which="both", linestyle="--", alpha=0.5)
+
+    axes[-1].set_xlabel("Concurrency")
+    fig.tight_layout()
+    fig.savefig(path, dpi=150, bbox_inches="tight")
+    plt.close(fig)
+


The script will crash if matplotlib is not installed because of the direct import. Since plotting is an optional enhancement, it's better to handle this dependency gracefully. I suggest moving the import inside the write_plots function and wrapping it in a try...except ImportError block. This allows the script to produce text and CSV summaries even if matplotlib is unavailable.

def write_plots(rows: list[dict], path: Path) -> None: try: import matplotlib.pyplot as plt except ImportError: print("Warning: matplotlib not found. Skipping plot generation.", file=sys.stderr) print("Install it with: pip install matplotlib", file=sys.stderr) return successful = [r for r in rows if r["status"] == "ok"] if not successful: print("No successful runs to plot.", file=sys.stderr) return x = [r["concurrency"] for r in successful] tps = [r["tps"] for r in successful] ttft_p99 = [r["ttft_p99_ms"] for r in successful] tpot_p50 = [r["tpot_p50_ms"] for r in successful] fig, axes = plt.subplots(3, 1, figsize=(9, 11), sharex=True) fig.suptitle("Concurrency Sweep Performance", fontsize=14, fontweight="bold") metrics = [ (axes[0], tps, "TPS", "Tokens / s", "tab:blue"), (axes[1], ttft_p99, "TTFT P99 (ms)", "Latency (ms)", "tab:orange"), (axes[2], tpot_p50, "TPOT P50 (ms)", "Latency (ms)", "tab:green"), ] for ax, values, label, ylabel, color in metrics: ax.plot(x, values, marker="o", linewidth=2, color=color, label=label) ax.set_ylabel(ylabel) ax.set_xscale("log", base=2) ax.set_xticks(x) ax.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda v, _: str(int(v)))) ax.legend(loc="upper left") ax.grid(True, which="both", linestyle="--", alpha=0.5) axes[-1].set_xlabel("Concurrency") fig.tight_layout() fig.savefig(path, dpi=150, bbox_inches="tight") plt.close(fig)

Copilot

Pull request overview

Adds a Qwen2.5-0.5B example workflow plus new concurrency sweep helper scripts to run multi-concurrency online benchmarks and summarize the resulting result_summary.json metrics into tables/files.

Changes:

Add scripts/concurrency_sweep/run.py to execute a benchmark repeatedly over a list of concurrency targets (one subdir per run).
Add scripts/concurrency_sweep/summarize.py to aggregate result_summary.json files into terminal tables + CSV/Markdown/PNG outputs.
Add a Qwen example package (YAML configs, dataset prep, run script, docs) for vLLM and SGLang.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
scripts/concurrency_sweep/summarize.py	Parses per-run `result_summary.json` metrics, prints tables, and writes CSV/MD/PNG summaries.
scripts/concurrency_sweep/run.py	Generates per-concurrency configs from a template YAML and runs the benchmark CLI for each sweep point.
examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml	Online SGLang benchmark config used as sweep template input.
examples/08_Qwen2.5-0.5B_Example/sglang_offline_qwen_benchmark.yaml	Offline SGLang benchmark config.
examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml	Online vLLM benchmark config used as sweep template input.
examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml	Offline vLLM benchmark config.
examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh	End-to-end runner: prepares dataset, launches server container, runs offline benchmark or online sweep, and prints next steps.
examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py	Produces a Qwen-specific dataset pickle with the expected `prompt` column.
examples/08_Qwen2.5-0.5B_Example/README.md	Full walkthrough for running Qwen benchmarks (manual + scripted).
examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md	Short-form “one command” quick start instructions.
examples/08_Qwen2.5-0.5B_Example/.gitignore	Ignores example-local generated artifacts (data pickle, logs, etc.).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh

+    READY_PATTERN="Uvicorn running\|Application startup complete"
+else
+    READY_PATTERN="Uvicorn running\|Server is ready"
+fi
+
+while [ $RETRY_COUNT -lt $MAX_RETRIES ]; do
+    if docker logs ${CONTAINER_NAME} 2>&1 | grep -q "$READY_PATTERN"; then


examples/08_Qwen2.5-0.5B_Example/README.md

examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh

scripts/concurrency_sweep/summarize.py

+def collect_results(sweep_dir: Path) -> list[dict]:
+    rows = []
+    for concurrency_dir in sorted(
+        sweep_dir.glob("concurrency_*"),
+        key=lambda p: int(p.name.split("_")[1]),


scripts/concurrency_sweep/summarize.py

+        ax.set_xscale("log", base=2)
+        ax.set_xticks(x)
+        ax.get_xaxis().set_major_formatter(plt.FuncFormatter(lambda v, _: str(int(v))))
+        ax.legend(loc="upper left")


scripts/concurrency_sweep/summarize.py

+def write_plots(rows: list[dict], path: Path) -> None:
+    import matplotlib.pyplot as plt
+
+    successful = [r for r in rows if r["status"] == "ok"]
+    if not successful:
+        print("No successful runs to plot.", file=sys.stderr)
+        return


examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml

+  api_key: null
+  api_type: "sglang"
+
+report_dir: "results/qwen_sglang_online_benchmark/"


Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds a concurrency sweep runner + result summarizer, along with a Qwen2.5-0.5B example (configs + scripts) to demonstrate offline and online benchmarking workflows in this repo.

Changes:

Add scripts/concurrency_sweep/run.py to execute a benchmark sweep across multiple concurrency targets using a YAML config template.
Add scripts/concurrency_sweep/summarize.py to aggregate result_summary.json files into tables plus CSV/Markdown/plot outputs.
Add examples/08_Qwen2.5-0.5B_Example/ with benchmark configs, dataset prep script, and a wrapper shell script for end-to-end runs on vLLM or SGLang.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
scripts/concurrency_sweep/summarize.py	Parses per-run `result_summary.json` and emits terminal tables + CSV/MD/PNG summaries.
scripts/concurrency_sweep/run.py	Runs `inference-endpoint benchmark from-config` repeatedly with varying concurrency and writes a sweep summary.
examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml	Online (concurrency) config for SGLang endpoint.
examples/08_Qwen2.5-0.5B_Example/sglang_offline_qwen_benchmark.yaml	Offline config for SGLang endpoint.
examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml	Online (concurrency) config for vLLM/OpenAI-compatible endpoint.
examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml	Offline config for vLLM/OpenAI-compatible endpoint.
examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py	Converts the repo dummy dataset into the example’s required prompt format.
examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh	Automates dataset prep, container startup, readiness checks, and benchmark execution.
examples/08_Qwen2.5-0.5B_Example/README.md	Full walkthrough for running the example manually or via wrapper script.
examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md	Short “fast path” instructions for running the example.
examples/08_Qwen2.5-0.5B_Example/.gitignore	Ignores example-local generated artifacts/logs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

scripts/concurrency_sweep/run.py

+        )
+        return 1
+
+    concurrency_values = sorted(set(args.concurrency))


scripts/concurrency_sweep/summarize.py

+    def f(v: object) -> float:
+        """Return v as float, falling back to 0.0 for None/missing."""
+        return float(v) if v is not None else 0.0
+
+    pct = lambda d, k: f(d.get("percentiles", {}).get(k))  # noqa: E731
+
+    return {
+        "qps": round(f(data.get("qps")), 2),
+        "tps": round(f(data.get("tps")), 2),
+        "latency_mean_ms": ns_to_ms(f(lat.get("avg"))),
+        "latency_p50_ms": ns_to_ms(f(lat.get("median"))),
+        "latency_p90_ms": ns_to_ms(pct(lat, "90")),
+        "latency_p95_ms": ns_to_ms(pct(lat, "95")),
+        "latency_p99_ms": ns_to_ms(pct(lat, "99")),
+        "ttft_mean_ms": ns_to_ms(f(ttft.get("avg"))),
+        "ttft_p50_ms": ns_to_ms(f(ttft.get("median"))),
+        "ttft_p90_ms": ns_to_ms(pct(ttft, "90")),
+        "ttft_p99_ms": ns_to_ms(pct(ttft, "99")),
+        "tpot_mean_ms": ns_to_ms(f(tpot.get("avg"))),
+        "tpot_p50_ms": ns_to_ms(f(tpot.get("median"))),
+        "tpot_p90_ms": ns_to_ms(pct(tpot, "90")),
+        "tpot_p99_ms": ns_to_ms(pct(tpot, "99")),
+        "n_completed": data.get("n_samples_completed", 0),
+        "duration_s": round(f(data.get("duration_ns")) / 1e9, 1),
+        "avg_output_tokens": round(f(osl.get("avg")), 1),


scripts/concurrency_sweep/summarize.py

+            cells = [str(r.get(k, r.get("status", "-"))) for k in col_keys]
+            lines.append("|" + "|".join(cells) + "|")


scripts/concurrency_sweep/summarize.py

+            if r["status"] != "ok":
+                vals = [f"{r['concurrency']:>6}", f"  {'-- ' + r['status']:>12}"]
+                print("  ".join(vals))
+                continue
+            parts = [f"{r['concurrency']:>6}"]
+            for _, key in columns:
+                parts.append(f"{r[key]:>12}")


Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot

Pull request overview

Adds a new concurrency sweep workflow (runner + summarizer) and a complete Qwen2.5-0.5B example package to run offline/online benchmarks and visualize sweep results.

Changes:

Introduces scripts/concurrency_sweep/run.py to execute a benchmark across multiple concurrency levels and record per-run logs plus a sweep summary.
Introduces scripts/concurrency_sweep/summarize.py to tabulate sweep metrics and emit CSV/Markdown plus an optional matplotlib plot.
Adds a full Qwen2.5-0.5B example (configs, dataset prep, wrapper script, README/quickstart) for vLLM and SGLang.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
scripts/concurrency_sweep/summarize.py	Parses `result_summary.json` per concurrency and outputs tables/CSV/MD/PNG summary artifacts.
scripts/concurrency_sweep/run.py	Runs `inference-endpoint benchmark from-config` repeatedly with concurrency overrides and writes sweep-level summaries/logs.
examples/08_Qwen2.5-0.5B_Example/online_qwen_benchmark.yaml	vLLM online benchmark template config used as sweep input.
examples/08_Qwen2.5-0.5B_Example/offline_qwen_benchmark.yaml	vLLM offline benchmark template config.
examples/08_Qwen2.5-0.5B_Example/sglang_online_qwen_benchmark.yaml	SGLang online benchmark template config (includes `api_type: sglang`).
examples/08_Qwen2.5-0.5B_Example/sglang_offline_qwen_benchmark.yaml	SGLang offline benchmark template config (includes `api_type: sglang`).
examples/08_Qwen2.5-0.5B_Example/prepare_dataset.py	Converts the repo dummy dataset into a `prompt`-column dataset for the example configs.
examples/08_Qwen2.5-0.5B_Example/run_benchmark.sh	Automated end-to-end wrapper (dataset prep, container start, run benchmark/sweep).
examples/08_Qwen2.5-0.5B_Example/README.md	Full step-by-step guide for running the sweep, summarizing, and running offline benchmarks.
examples/08_Qwen2.5-0.5B_Example/QUICKSTART.md	Short-form quickstart instructions.
examples/08_Qwen2.5-0.5B_Example/.gitignore	Ignores generated datasets, logs, and results under the example directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

scripts/concurrency_sweep/run.py

+        if verbose:
+            with log_path.open("w") as log_file:
+                proc = subprocess.Popen(
+                    cmd,
+                    stdout=subprocess.PIPE,
+                    stderr=subprocess.STDOUT,
+                    text=True,
+                )
+                assert proc.stdout is not None
+                for line in proc.stdout:
+                    print(line, end="", flush=True)
+                    log_file.write(line)
+                try:
+                    proc.wait(timeout=timeout_seconds)
+                except subprocess.TimeoutExpired:
+                    proc.kill()
+                    raise
+            returncode = proc.returncode


scripts/concurrency_sweep/summarize.py

+                vals = [f"{r['concurrency']:>6}", f"  {'-- ' + r['status']:>12}"]
+                print("  ".join(vals))


scripts/concurrency_sweep/summarize.py

+        hdr = "|" + "|".join(headers) + "|"
+        lines = [hdr, sep]
+        for r in data:
+            cells = [str(r.get(k, r.get("status", "-"))) for k in col_keys]


scripts/concurrency_sweep/run.py

+        return 1
+
+    base_config = load_config(args.config)
+


arekay-nv requested a review from a team as a code owner March 17, 2026 21:13

Copilot AI review requested due to automatic review settings March 17, 2026 21:13

github-actions bot requested a review from nvzhihanj March 17, 2026 21:13

Copilot started reviewing on behalf of arekay-nv March 17, 2026 21:13 View session

gemini-code-assist bot reviewed Mar 17, 2026

View reviewed changes

Copilot AI reviewed Mar 17, 2026

View reviewed changes

Add qwen example for smaller GPU testing

2fd76c4

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

arekay-nv force-pushed the arekay/qwen_example branch from e2ea7dc to 2fd76c4 Compare March 19, 2026 05:56

Copilot AI review requested due to automatic review settings March 19, 2026 06:44

Copilot AI reviewed Mar 19, 2026

View reviewed changes

arekay-nv force-pushed the arekay/qwen_example branch from 53adb81 to 9cd8039 Compare March 19, 2026 16:54

Fix matplotlib + shell

644f755

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 19, 2026 17:46

arekay-nv force-pushed the arekay/qwen_example branch from 9cd8039 to 644f755 Compare March 19, 2026 17:46

Copilot started reviewing on behalf of arekay-nv March 19, 2026 17:47 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

	if docker logs ${CONTAINER_NAME} 2>&1 \| grep -q "$READY_PATTERN"; then
	if docker logs ${CONTAINER_NAME} 2>&1 \| grep -qE "$READY_PATTERN"; then

	elif curl -s http://localhost:${SERVER_PORT}/health 2>/dev/null \| grep -q "ok\\|healthy"; then
	elif curl -s http://localhost:${SERVER_PORT}/health 2>/dev/null \| grep -qE "ok\|healthy"; then

		cells = [str(r.get(k, r.get("status", "-"))) for k in col_keys]
		lines.append("\|" + "\|".join(cells) + "\|")

		vals = [f"{r['concurrency']:>6}", f" {'-- ' + r['status']:>12}"]
		print(" ".join(vals))

Conversation

arekay-nv commented Mar 17, 2026

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 17, 2026 •

edited

Loading