Enhance performance test by XuehaoSun · Pull Request #1610 · intel/auto-round

XuehaoSun · 2026-03-25T06:30:45Z

Description

Please briefly describe your main changes, the motivation.

Type of Change

Related Issues

Fixes or relates to #

Checklist Before Submitting

My code has been tested locally.
Documentation has been updated as needed.
New or updated tests are included where applicable.

…e scripts Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

Copilot

Pull request overview

This PR updates the Azure Pipelines performance test scripts to separate “current” vs “baseline” outputs and expand performance validation beyond tuning time to include memory and output size metrics.

Changes:

Write quantization outputs into mode-specific directories (./current, ./baseline) instead of a shared ./saved directory.
Refactor check_performance.py to parse multiple metrics (tuning time, peak RAM/VRAM, output size) and compare current vs baseline with tolerances.
Improve performance check output formatting via structured logging.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`.azure-pipelines/scripts/performance/perf_test.sh`	Changes output directory per test mode and retains logs/artifacts between runs.
`.azure-pipelines/scripts/performance/check_performance.py`	Adds parsing/comparison for multiple performance metrics and computes output directory sizes.

Copilot · 2026-03-25T06:32:33Z

.azure-pipelines/scripts/performance/perf_test.sh

    local log_file="perf_test_${test_mode}.log"
-    rm -rf "saved" "${LOG_DIR}/${log_file}"
    echo "##[group]run ${test_mode} performance test..."
-    auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir ./saved 2>&1 | tee -a "${LOG_DIR}/${log_file}"
+    auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir "./${test_mode}" 2>&1 | tee -a "${LOG_DIR}/${log_file}"
    echo "##[endgroup]"


tee -a appends to the existing log file, and the script no longer removes prior logs/output directories. This can cause performance parsing to pick up stale runs and inflate output-size measurements across retries. Consider either deleting ${LOG_DIR}/${log_file} and ./${test_mode} before running, or switch tee to overwrite (no -a) and ensure output dirs are cleaned per run.

Copilot · 2026-03-25T06:32:34Z

.azure-pipelines/scripts/performance/check_performance.py

+    ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
+    if ram_match:
+        metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
+        metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
+


The peak_ram/peak_vram regex is too strict for the log format produced by PeakMemory.get_summary() (it includes a space after the comma: ..., 'peak_vram': ...). As written, this will likely never match and will leave RAM/VRAM metrics as None, failing the check. Update the pattern to allow whitespace after commas and consider handling the multi-device peak_vram dict case (e.g., choose max value).

Suggested change

ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)

if ram_match:

metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)

metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)

# Parse peak RAM independently of VRAM and whitespace formatting

ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?", content)

if ram_match:

metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)

# Parse peak VRAM; handle both scalar and multi-device dict formats

vram_simple_match = re.search(r"'peak_vram':\s*([\d.]+)GB?", content)

if vram_simple_match:

metrics.peak_vram_gb = round(float(vram_simple_match.group(1)), 4)

else:

# Example dict format: 'peak_vram': {'cuda:0': 1.2GB, 'cuda:1': 1.5GB}

vram_dict_match = re.search(r"'peak_vram':\s*\{([^}]+)\}", content)

if vram_dict_match:

vram_body = vram_dict_match.group(1)

values = re.findall(r"([\d.]+)GB?", vram_body)

if values:

max_vram = max(float(v) for v in values)

metrics.peak_vram_gb = round(max_vram, 4)

Copilot · 2026-03-25T06:32:34Z

.azure-pipelines/scripts/performance/check_performance.py

+    time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content)
+    if time_match:
+        metrics.tuning_time_s = round(float(time_match.group(1)), 4)
+
+    ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
+    if ram_match:
+        metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
+        metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)


parse_log_file() uses re.search(...), which returns the first match in the file. If logs are appended (or contain multiple iterations), this will capture an older run rather than the most recent one. Prefer re.findall(...) and use the last match, or ensure the log file is overwritten/cleared before each run.

Suggested change

time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content)

if time_match:

metrics.tuning_time_s = round(float(time_match.group(1)), 4)

ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)

if ram_match:

metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)

metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)

# Use findall to capture all occurrences and take the most recent one.

time_matches = re.findall(r"tuning time ([0-9]+\.[0-9]+)", content)

if time_matches:

metrics.tuning_time_s = round(float(time_matches[-1]), 4)

ram_matches = re.findall(

r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content

)

if ram_matches:

last_ram, last_vram = ram_matches[-1]

metrics.peak_ram_gb = round(float(last_ram), 4)

metrics.peak_vram_gb = round(float(last_vram), 4)

for more information, see https://pre-commit.ci

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

fix: enhance performance logging and metrics collection in performanc…

07ecc7f

…e scripts Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

Copilot AI review requested due to automatic review settings March 25, 2026 06:30

Copilot started reviewing on behalf of XuehaoSun March 25, 2026 06:31 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

pre-commit-ci bot and others added 2 commits March 25, 2026 06:33

[pre-commit.ci] auto fixes from pre-commit.com hooks

38e6aa3

for more information, see https://pre-commit.ci

fix pattern

f349436

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance performance test#1610

Enhance performance test#1610
XuehaoSun wants to merge 3 commits intomainfrom
xuehao/fix_hpu_perf_test

XuehaoSun commented Mar 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Copilot AI Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
-    if ram_match:
-        metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
-        metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
+    # Parse peak RAM independently of VRAM and whitespace formatting
+    ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?", content)
+    if ram_match:
+        metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
+    # Parse peak VRAM; handle both scalar and multi-device dict formats
+    vram_simple_match = re.search(r"'peak_vram':\s*([\d.]+)GB?", content)
+    if vram_simple_match:
+        metrics.peak_vram_gb = round(float(vram_simple_match.group(1)), 4)
+    else:
+        # Example dict format: 'peak_vram': {'cuda:0': 1.2GB, 'cuda:1': 1.5GB}
+        vram_dict_match = re.search(r"'peak_vram':\s*\{([^}]+)\}", content)
+        if vram_dict_match:
+            vram_body = vram_dict_match.group(1)
+            values = re.findall(r"([\d.]+)GB?", vram_body)
+            if values:
+                max_vram = max(float(v) for v in values)
+                metrics.peak_vram_gb = round(max_vram, 4)

-    time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content)
-    if time_match:
-        metrics.tuning_time_s = round(float(time_match.group(1)), 4)
-    ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
-    if ram_match:
-        metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
-        metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
+    # Use findall to capture all occurrences and take the most recent one.
+    time_matches = re.findall(r"tuning time ([0-9]+\.[0-9]+)", content)
+    if time_matches:
+        metrics.tuning_time_s = round(float(time_matches[-1]), 4)
+    ram_matches = re.findall(
+        r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content
+    )
+    if ram_matches:
+        last_ram, last_vram = ram_matches[-1]
+        metrics.peak_ram_gb = round(float(last_ram), 4)
+        metrics.peak_vram_gb = round(float(last_vram), 4)

Conversation

XuehaoSun commented Mar 25, 2026

Description

Type of Change

Related Issues

Checklist Before Submitting

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants