Skip to content

Enhance performance test#1610

Open
XuehaoSun wants to merge 3 commits intomainfrom
xuehao/fix_hpu_perf_test
Open

Enhance performance test#1610
XuehaoSun wants to merge 3 commits intomainfrom
xuehao/fix_hpu_perf_test

Conversation

@XuehaoSun
Copy link
Contributor

Description

Please briefly describe your main changes, the motivation.

Type of Change

  • Bug fix
  • New feature
  • Documentation update
  • Performance improvement
  • Code refactoring
  • Other (please specify):

Related Issues

Fixes or relates to #

Checklist Before Submitting

  • My code has been tested locally.
  • Documentation has been updated as needed.
  • New or updated tests are included where applicable.

…e scripts

Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Copilot AI review requested due to automatic review settings March 25, 2026 06:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Azure Pipelines performance test scripts to separate “current” vs “baseline” outputs and expand performance validation beyond tuning time to include memory and output size metrics.

Changes:

  • Write quantization outputs into mode-specific directories (./current, ./baseline) instead of a shared ./saved directory.
  • Refactor check_performance.py to parse multiple metrics (tuning time, peak RAM/VRAM, output size) and compare current vs baseline with tolerances.
  • Improve performance check output formatting via structured logging.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.azure-pipelines/scripts/performance/perf_test.sh Changes output directory per test mode and retains logs/artifacts between runs.
.azure-pipelines/scripts/performance/check_performance.py Adds parsing/comparison for multiple performance metrics and computes output directory sizes.

Comment on lines 39 to 42
local log_file="perf_test_${test_mode}.log"
rm -rf "saved" "${LOG_DIR}/${log_file}"
echo "##[group]run ${test_mode} performance test..."
auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir ./saved 2>&1 | tee -a "${LOG_DIR}/${log_file}"
auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir "./${test_mode}" 2>&1 | tee -a "${LOG_DIR}/${log_file}"
echo "##[endgroup]"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tee -a appends to the existing log file, and the script no longer removes prior logs/output directories. This can cause performance parsing to pick up stale runs and inflate output-size measurements across retries. Consider either deleting ${LOG_DIR}/${log_file} and ./${test_mode} before running, or switch tee to overwrite (no -a) and ensure output dirs are cleaned per run.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +47
ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
if ram_match:
metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peak_ram/peak_vram regex is too strict for the log format produced by PeakMemory.get_summary() (it includes a space after the comma: ..., 'peak_vram': ...). As written, this will likely never match and will leave RAM/VRAM metrics as None, failing the check. Update the pattern to allow whitespace after commas and consider handling the multi-device peak_vram dict case (e.g., choose max value).

Suggested change
ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
if ram_match:
metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
# Parse peak RAM independently of VRAM and whitespace formatting
ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?", content)
if ram_match:
metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
# Parse peak VRAM; handle both scalar and multi-device dict formats
vram_simple_match = re.search(r"'peak_vram':\s*([\d.]+)GB?", content)
if vram_simple_match:
metrics.peak_vram_gb = round(float(vram_simple_match.group(1)), 4)
else:
# Example dict format: 'peak_vram': {'cuda:0': 1.2GB, 'cuda:1': 1.5GB}
vram_dict_match = re.search(r"'peak_vram':\s*\{([^}]+)\}", content)
if vram_dict_match:
vram_body = vram_dict_match.group(1)
values = re.findall(r"([\d.]+)GB?", vram_body)
if values:
max_vram = max(float(v) for v in values)
metrics.peak_vram_gb = round(max_vram, 4)

Copilot uses AI. Check for mistakes.
Comment on lines +39 to +46
time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content)
if time_match:
metrics.tuning_time_s = round(float(time_match.group(1)), 4)

ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
if ram_match:
metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_log_file() uses re.search(...), which returns the first match in the file. If logs are appended (or contain multiple iterations), this will capture an older run rather than the most recent one. Prefer re.findall(...) and use the last match, or ensure the log file is overwritten/cleared before each run.

Suggested change
time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content)
if time_match:
metrics.tuning_time_s = round(float(time_match.group(1)), 4)
ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content)
if ram_match:
metrics.peak_ram_gb = round(float(ram_match.group(1)), 4)
metrics.peak_vram_gb = round(float(ram_match.group(2)), 4)
# Use findall to capture all occurrences and take the most recent one.
time_matches = re.findall(r"tuning time ([0-9]+\.[0-9]+)", content)
if time_matches:
metrics.tuning_time_s = round(float(time_matches[-1]), 4)
ram_matches = re.findall(
r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content
)
if ram_matches:
last_ram, last_vram = ram_matches[-1]
metrics.peak_ram_gb = round(float(last_ram), 4)
metrics.peak_vram_gb = round(float(last_vram), 4)

Copilot uses AI. Check for mistakes.
pre-commit-ci bot and others added 2 commits March 25, 2026 06:33
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants