Skip to content

Commit 4d30776

Browse files
committed
final
Signed-off-by: Yang Wang <[email protected]>
1 parent 908ee26 commit 4d30776

File tree

2 files changed

+178
-54
lines changed

2 files changed

+178
-54
lines changed
Lines changed: 91 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,74 +1,126 @@
11
# Benchmark Tooling
22

3-
A library providing tools for benchmarking ExecutorchBenchmark data.
3+
A library providing tools for fetching, processing, and analyzing ExecutorchBenchmark data from the HUD Open API.
44

5-
## Read Benchmark Data
6-
`get_benchmark_analysis_data.py` fetches benchmark data from HUD Open API, clean the data that only contains FAILURE_REPORT column,
7-
and get all private device metrics and associated public device metrics if any based on [model,backend,device,ios]
8-
9-
### Quick Start
5+
## Installation
106

117
Install dependencies:
128
```bash
139
pip install -r requirements.txt
1410
```
1511

16-
Run with csv output (CLI):
12+
## Tools
13+
14+
### get_benchmark_analysis_data.py
15+
16+
This script fetches benchmark data from HUD Open API, cleans data that only contains FAILURE_REPORT columns, and retrieves all private device metrics and associated public device metrics based on [model, backend, device, arch].
17+
18+
#### Quick Start
19+
1720
```bash
18-
python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py --startTime "2025-06-11T00:00:00" --endTime "2025-06-17T18:00:00" --outputType "csv"
21+
python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \
22+
--startTime "2025-06-11T00:00:00" \
23+
--endTime "2025-06-17T18:00:00" \
24+
--outputType "csv"
1925
```
2026

21-
Additional options:
22-
- `--not-silent`: show processing logs, otherwise only show results & minimum loggings
23-
- `--outputType df`: Display results in DataFrame format
24-
- `--outputType excel --outputDir "{YOUR_LOCAL_DIRECTORY}"`: Generate Excel file with multiple sheets (`res_private.xlsx` and `res_public.xlsx`)
25-
- `--outputType csv --outputDir "{YOUR_LOCAL_DIRECTORY}"`: Generate CSV files in folders (`private` and `public`)
27+
#### Command Line Options
28+
29+
##### Basic Options:
30+
- `--startTime`: Start time in ISO format (e.g., "2025-06-11T00:00:00") (required)
31+
- `--endTime`: End time in ISO format (e.g., "2025-06-17T18:00:00") (required)
32+
- `--env`: Choose environment ("local" or "prod", default: "prod")
33+
- `--not-silent`: Show processing logs (default: only show results & minimum logging)
34+
35+
##### Output Options:
36+
- `--outputType`: Choose output format (default: "print")
37+
- `print`: Display results in console
38+
- `json`: Generate JSON file
39+
- `df`: Display results in DataFrame format
40+
- `excel`: Generate Excel files with multiple sheets
41+
- `csv`: Generate CSV files in separate folders
42+
- `--outputDir`: Directory to save output files (default: current directory)
43+
44+
##### Filtering Options:
45+
- `--devices`: Filter by specific device names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g")
46+
- `--backends`: Filter by specific backend names
47+
- `--models`: Filter by specific model names
2648

27-
you can then call methods in common.py to convert the file date back to df version
28-
```python3
49+
#### Working with Output Files
50+
51+
You can use methods in `common.py` to convert the file data back to DataFrame format:
52+
53+
```python
2954
import logging
3055
logging.basicConfig(level=logging.INFO)
31-
from common.py import
56+
from .ci.scripts.benchmark_tooling.common import read_all_csv_with_metadata, read_excel_with_json_header
3257

33-
# assume the folder private for csv is in cunrrent directory
58+
# For CSV files (assuming the 'private' folder is in the current directory)
3459
folder_path = './private'
3560
res = read_all_csv_with_metadata(folder_path)
3661
logging.info(res)
3762

38-
# assume the excel file for private device is in cunrrent directory
39-
folder_path = "./private.xlsx"
40-
res = read_excel_with_json_header(folder_path)
63+
# For Excel files (assuming the Excel file is in the current directory)
64+
file_path = "./private.xlsx"
65+
res = read_excel_with_json_header(file_path)
4166
logging.info(res)
4267
```
4368

44-
### Python API Usage
69+
#### Python API Usage
4570

4671
To use the benchmark fetcher in your own scripts:
4772

4873
```python
49-
import ExecutorchBenchmarkFetcher from benchmark_tooling.get_benchmark_analysis_data
50-
fetcher = ExecutorchBenchmarkFetcher()
51-
# Must call run first
52-
fetcher.run()
53-
res = fetcher.
54-
```
74+
from .ci.scripts.benchmark_tooling.get_benchmark_analysis_data import ExecutorchBenchmarkFetcher
5575

56-
## analyze_benchmark_stability.py
57-
`analyze_benchmark_stability.py` analyzes the stability of benchmark data, comparing the results of private and public devices.
76+
# Initialize the fetcher
77+
fetcher = ExecutorchBenchmarkFetcher(env="prod", disable_logging=False)
5878

59-
### Quick Start
60-
Install dependencies:
61-
```bash
62-
pip install -r requirements.txt
63-
```
79+
# Fetch data for a specific time range
80+
fetcher.run(
81+
start_time="2025-06-11T00:00:00",
82+
end_time="2025-06-17T18:00:00"
83+
)
6484

85+
# Get results in different formats
86+
# As DataFrames
87+
df_results = fetcher.to_df()
88+
89+
# Export to Excel
90+
fetcher.to_excel(output_dir="./results")
91+
92+
# Export to CSV
93+
fetcher.to_csv(output_dir="./results")
94+
95+
# Export to JSON
96+
json_path = fetcher.to_json(output_dir="./results")
97+
98+
# Get raw dictionary results
99+
dict_results = fetcher.to_dict()
100+
101+
# Use the output_data method for flexible output
102+
results = fetcher.output_data(output_type="excel", output_dir="./results")
65103
```
104+
105+
### analyze_benchmark_stability.py
106+
107+
This script analyzes the stability of benchmark data, comparing the results of private and public devices.
108+
109+
#### Quick Start
110+
111+
```bash
66112
python .ci/scripts/benchmark_tooling/analyze_benchmark_stability.py \
67-
Benchmark\ Dataset\ with\ Private\ AWS\ Devices.xlsx \
68-
--reference_file Benchmark\ Dataset\ with\ Public\ AWS\ Devices.xlsx
113+
"Benchmark Dataset with Private AWS Devices.xlsx" \
114+
--reference_file "Benchmark Dataset with Public AWS Devices.xlsx"
69115
```
70-
## Run unittest
71-
```
72-
cd execuTorch/
116+
117+
## Running Unit Tests
118+
119+
The benchmark tooling includes comprehensive unit tests to ensure functionality.
120+
121+
### Using pytest
122+
123+
```bash
124+
# From the executorch root directory
73125
pytest -c /dev/null .ci/scripts/tests/test_get_benchmark_analysis_data.py
74126
```

.ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py

Lines changed: 87 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
and customizing data retrieval parameters.
88
"""
99

10-
from yaspin import yaspin
1110
import argparse
1211
import json
1312
import logging
@@ -21,6 +20,7 @@
2120

2221
import pandas as pd
2322
import requests
23+
from yaspin import yaspin
2424

2525
logging.basicConfig(level=logging.INFO)
2626

@@ -80,6 +80,13 @@ class MatchingGroupResult:
8080
data: list
8181

8282

83+
@dataclass
84+
class BenchmarkFilters:
85+
models: list
86+
backends: list
87+
devices: list
88+
89+
8390
BASE_URLS = {
8491
"local": "http://localhost:3000",
8592
"prod": "https://hud.pytorch.org",
@@ -156,19 +163,21 @@ def run(
156163
self,
157164
start_time: str,
158165
end_time: str,
166+
filters: BenchmarkFilters,
159167
) -> None:
168+
# reset group & raw data for new run
169+
self.matching_groups = {}
170+
self.data = None
171+
160172
data = self._fetch_execu_torch_data(start_time, end_time)
161173
if data is None:
162174
logging.warning("no data fetched from the HUD API")
163175
return None
164-
165-
res = self._process(data)
176+
res = self._process(data, filters)
166177
self.data = res.get("data", [])
167178
private_list = res.get("private", [])
168179
public_list = self._filter_public_result(private_list, res["public"])
169180

170-
# reset group
171-
self.matching_groups = {}
172181
self.matching_groups["private"] = MatchingGroupResult(
173182
category="private", data=private_list
174183
)
@@ -456,13 +465,18 @@ def print_all_groups_info(self) -> None:
456465
if not self.data or not self.matching_groups:
457466
logging.info("No data found, please call get_data() first")
458467
return
459-
logging.info(f" all clean benchmark table info from HUD")
468+
logging.info(
469+
"=========== Full list of table info from HUD API =============\n"
470+
" please use values in field `info` for filtering, "
471+
"while `groupInfo` holds the original benchmark metadata"
472+
)
460473
names = []
461474
for item in self.data:
462475
names.append(
463476
{
464477
"table_name": item.get("table_name", ""),
465-
"groupInfo": item.get("groupInfo", ""),
478+
"groupInfo": item.get("groupInfo", {}),
479+
"info": item.get("info", {}),
466480
"counts": len(item.get("rows", [])),
467481
}
468482
)
@@ -492,7 +506,7 @@ def _generate_matching_name(self, group_info: dict, fields: list[str]) -> str:
492506
# name = name +'(private)'
493507
return name
494508

495-
def _process(self, input_data: List[Dict[str, Any]]):
509+
def _process(self, input_data: List[Dict[str, Any]], filters: BenchmarkFilters):
496510
"""
497511
Process raw benchmark data.
498512
@@ -509,9 +523,9 @@ def _process(self, input_data: List[Dict[str, Any]]):
509523
# filter data with arch equal exactly "",ios and android, this normally indicates it's job-level falure indicator
510524
logging.info(f"fetched {len(input_data)} data from HUD")
511525
data = self._clean_data(input_data)
512-
513526
private = []
514527
public = []
528+
515529
for item in data:
516530
# normalized string values groupInfo to info
517531
item["info"] = {
@@ -528,17 +542,30 @@ def _process(self, input_data: List[Dict[str, Any]]):
528542
# Mark aws_type: private or public
529543
if group.get("device", "").find("private") != -1:
530544
item["info"]["aws_type"] = "private"
531-
private.append(item)
532545
else:
533546
item["info"]["aws_type"] = "public"
534547
public.append(item)
535-
data.sort(key=lambda x: x["table_name"])
536-
private.sort(key=lambda x: x["table_name"])
537-
public.sort(key=lambda x: x["table_name"])
548+
raw_data = deepcopy(data)
549+
550+
# applies customized filters if any
551+
data = self.filter_results(data, filters)
552+
# generate private and public results
553+
private = sorted(
554+
(
555+
item
556+
for item in data
557+
if item.get("info", {}).get("aws_type") == "private"
558+
),
559+
key=lambda x: x["table_name"],
560+
)
561+
public = sorted(
562+
(item for item in data if item.get("info", {}).get("aws_type") == "public"),
563+
key=lambda x: x["table_name"],
564+
)
538565
logging.info(
539566
f"fetched clean data {len(data)}, private:{len(private)}, public:{len(public)}"
540567
)
541-
return {"data": data, "private": private, "public": public}
568+
return {"data": raw_data, "private": private, "public": public}
542569

543570
def _clean_data(self, data_list):
544571
removed_gen_arch = [
@@ -575,6 +602,7 @@ def _fetch_execu_torch_data(self, start_time, end_time):
575602

576603
def normalize_string(self, s: str) -> str:
577604
s = s.lower().strip()
605+
s = s.replace("+","plus")
578606
s = s.replace("_", "-")
579607
s = s.replace(" ", "-")
580608
s = re.sub(r"[^\w\-\.\(\)]", "-", s)
@@ -583,6 +611,37 @@ def normalize_string(self, s: str) -> str:
583611
s = s.replace(")-", ")").replace("-)", ")")
584612
return s
585613

614+
def filter_results(self, data: List, filters: BenchmarkFilters):
615+
backends = filters.backends
616+
devices = filters.devices
617+
models = filters.models
618+
619+
if not backends and not devices and not models:
620+
return data
621+
logging.info(
622+
f"applies OR filter: backends {backends}, devices:{devices},models:{models} "
623+
)
624+
pre_len = len(data)
625+
results = []
626+
for item in data:
627+
info = item.get("info", {})
628+
if backends and info.get("backend") not in backends:
629+
continue
630+
if devices and not any(dev in info.get("device", "") for dev in devices):
631+
continue
632+
if models and info.get("model", "") not in models:
633+
continue
634+
results.append(item)
635+
after_len = len(results)
636+
logging.info(f"applied customized filter before: {pre_len}, after: {after_len}")
637+
if after_len == 0:
638+
logging.info(
639+
"it seems like there is no result matches the filter values"
640+
", please run script --no-silent again, and search for values in field"
641+
" 'info' for right format"
642+
)
643+
return results
644+
586645

587646
def argparsers():
588647
parser = argparse.ArgumentParser(description="Benchmark Analysis Runner")
@@ -622,7 +681,17 @@ def argparsers():
622681
parser.add_argument(
623682
"--outputDir", default=".", help="Output directory, default is ."
624683
)
625-
684+
parser.add_argument(
685+
"--backends",
686+
nargs="+",
687+
help="Filter results by one or more backend full name(e.g. --backend qlora mv3) (OR logic)",
688+
)
689+
parser.add_argument(
690+
"--devices",
691+
nargs="+",
692+
help="Filter results by device names (e.g. --devices samsung-galaxy-s22-5g)(OR logic)",
693+
)
694+
parser.add_argument("--models", nargs="+", help="Filter by models (OR logic)")
626695
return parser.parse_args()
627696

628697

@@ -632,6 +701,9 @@ def argparsers():
632701
result = fetcher.run(
633702
args.startTime,
634703
args.endTime,
704+
filters=BenchmarkFilters(
705+
models=args.models, backends=args.backends, devices=args.devices
706+
),
635707
)
636708
if not args.silent:
637709
fetcher.print_all_groups_info()

0 commit comments

Comments
 (0)