Conversation
…e scripts Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the Azure Pipelines performance test scripts to separate “current” vs “baseline” outputs and expand performance validation beyond tuning time to include memory and output size metrics.
Changes:
- Write quantization outputs into mode-specific directories (
./current,./baseline) instead of a shared./saveddirectory. - Refactor
check_performance.pyto parse multiple metrics (tuning time, peak RAM/VRAM, output size) and compare current vs baseline with tolerances. - Improve performance check output formatting via structured logging.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
.azure-pipelines/scripts/performance/perf_test.sh |
Changes output directory per test mode and retains logs/artifacts between runs. |
.azure-pipelines/scripts/performance/check_performance.py |
Adds parsing/comparison for multiple performance metrics and computes output directory sizes. |
| local log_file="perf_test_${test_mode}.log" | ||
| rm -rf "saved" "${LOG_DIR}/${log_file}" | ||
| echo "##[group]run ${test_mode} performance test..." | ||
| auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir ./saved 2>&1 | tee -a "${LOG_DIR}/${log_file}" | ||
| auto-round --model_name ${model_name} --bits 4 --iters 200 --enable_torch_compile --device hpu --output_dir "./${test_mode}" 2>&1 | tee -a "${LOG_DIR}/${log_file}" | ||
| echo "##[endgroup]" |
There was a problem hiding this comment.
tee -a appends to the existing log file, and the script no longer removes prior logs/output directories. This can cause performance parsing to pick up stale runs and inflate output-size measurements across retries. Consider either deleting ${LOG_DIR}/${log_file} and ./${test_mode} before running, or switch tee to overwrite (no -a) and ensure output dirs are cleaned per run.
| ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content) | ||
| if ram_match: | ||
| metrics.peak_ram_gb = round(float(ram_match.group(1)), 4) | ||
| metrics.peak_vram_gb = round(float(ram_match.group(2)), 4) | ||
|
|
There was a problem hiding this comment.
The peak_ram/peak_vram regex is too strict for the log format produced by PeakMemory.get_summary() (it includes a space after the comma: ..., 'peak_vram': ...). As written, this will likely never match and will leave RAM/VRAM metrics as None, failing the check. Update the pattern to allow whitespace after commas and consider handling the multi-device peak_vram dict case (e.g., choose max value).
| ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content) | |
| if ram_match: | |
| metrics.peak_ram_gb = round(float(ram_match.group(1)), 4) | |
| metrics.peak_vram_gb = round(float(ram_match.group(2)), 4) | |
| # Parse peak RAM independently of VRAM and whitespace formatting | |
| ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?", content) | |
| if ram_match: | |
| metrics.peak_ram_gb = round(float(ram_match.group(1)), 4) | |
| # Parse peak VRAM; handle both scalar and multi-device dict formats | |
| vram_simple_match = re.search(r"'peak_vram':\s*([\d.]+)GB?", content) | |
| if vram_simple_match: | |
| metrics.peak_vram_gb = round(float(vram_simple_match.group(1)), 4) | |
| else: | |
| # Example dict format: 'peak_vram': {'cuda:0': 1.2GB, 'cuda:1': 1.5GB} | |
| vram_dict_match = re.search(r"'peak_vram':\s*\{([^}]+)\}", content) | |
| if vram_dict_match: | |
| vram_body = vram_dict_match.group(1) | |
| values = re.findall(r"([\d.]+)GB?", vram_body) | |
| if values: | |
| max_vram = max(float(v) for v in values) | |
| metrics.peak_vram_gb = round(max_vram, 4) |
| time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content) | ||
| if time_match: | ||
| metrics.tuning_time_s = round(float(time_match.group(1)), 4) | ||
|
|
||
| ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content) | ||
| if ram_match: | ||
| metrics.peak_ram_gb = round(float(ram_match.group(1)), 4) | ||
| metrics.peak_vram_gb = round(float(ram_match.group(2)), 4) |
There was a problem hiding this comment.
parse_log_file() uses re.search(...), which returns the first match in the file. If logs are appended (or contain multiple iterations), this will capture an older run rather than the most recent one. Prefer re.findall(...) and use the last match, or ensure the log file is overwritten/cleared before each run.
| time_match = re.search(r"tuning time ([0-9]+\.[0-9]+)", content) | |
| if time_match: | |
| metrics.tuning_time_s = round(float(time_match.group(1)), 4) | |
| ram_match = re.search(r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content) | |
| if ram_match: | |
| metrics.peak_ram_gb = round(float(ram_match.group(1)), 4) | |
| metrics.peak_vram_gb = round(float(ram_match.group(2)), 4) | |
| # Use findall to capture all occurrences and take the most recent one. | |
| time_matches = re.findall(r"tuning time ([0-9]+\.[0-9]+)", content) | |
| if time_matches: | |
| metrics.tuning_time_s = round(float(time_matches[-1]), 4) | |
| ram_matches = re.findall( | |
| r"'peak_ram':\s*([\d.]+)GB?.*?,'peak_vram':\s*([\d.]+)GB?", content | |
| ) | |
| if ram_matches: | |
| last_ram, last_vram = ram_matches[-1] | |
| metrics.peak_ram_gb = round(float(last_ram), 4) | |
| metrics.peak_vram_gb = round(float(last_vram), 4) |
for more information, see https://pre-commit.ci
Signed-off-by: Sun, Xuehao <xuehao.sun@intel.com>
Description
Please briefly describe your main changes, the motivation.
Type of Change
Related Issues
Fixes or relates to #
Checklist Before Submitting