Skip to content

[Backend Tester] Write report progressively #13308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 140 commits into from
Aug 13, 2025
Merged
Show file tree
Hide file tree
Changes from 139 commits
Commits
Show all changes
140 commits
Select commit Hold shift + click to select a range
f120e70
Update
GregoryComer Jul 18, 2025
0fb85e6
Update
GregoryComer Jul 18, 2025
4d8d844
Update
GregoryComer Jul 19, 2025
dc12b40
Update
GregoryComer Jul 21, 2025
ead0616
Update
GregoryComer Jul 22, 2025
0f13676
Update
GregoryComer Jul 22, 2025
b0b01f2
Update
GregoryComer Jul 22, 2025
8b9c9ef
Update
GregoryComer Jul 22, 2025
06bf03a
Update
GregoryComer Jul 22, 2025
2f8f49b
Update
GregoryComer Jul 22, 2025
8ca7766
Update
GregoryComer Jul 22, 2025
bffb95f
Update
GregoryComer Jul 22, 2025
d21492b
Update
GregoryComer Jul 22, 2025
e2c4ea5
Update
GregoryComer Jul 22, 2025
8230848
Update
GregoryComer Jul 22, 2025
2a1f564
Update
GregoryComer Jul 22, 2025
b35e7b1
Update
GregoryComer Jul 22, 2025
5c4c6ce
Update
GregoryComer Jul 22, 2025
9397803
Update
GregoryComer Jul 22, 2025
9dfeb5a
Update
GregoryComer Jul 22, 2025
ff5c4a5
Update
GregoryComer Jul 22, 2025
42a5de5
Update
GregoryComer Jul 22, 2025
402d8f5
Update
GregoryComer Jul 22, 2025
34d3ab3
Update
GregoryComer Jul 22, 2025
1105e04
Update
GregoryComer Jul 22, 2025
482bd21
Update
GregoryComer Jul 22, 2025
ea548b7
Update
GregoryComer Jul 23, 2025
4108f54
Update
GregoryComer Jul 23, 2025
7ef236b
Update
GregoryComer Jul 23, 2025
4a58c9d
Update
GregoryComer Jul 23, 2025
3b866b4
Update
GregoryComer Jul 23, 2025
5ba25cb
Update
GregoryComer Jul 23, 2025
19760fc
Update
GregoryComer Jul 23, 2025
81dfb07
Update
GregoryComer Jul 23, 2025
4d50265
Update
GregoryComer Jul 23, 2025
5f66043
Update
GregoryComer Jul 23, 2025
24e919d
Update
GregoryComer Jul 23, 2025
523cc20
Update
GregoryComer Jul 23, 2025
74c95fe
Update
GregoryComer Jul 23, 2025
5d437b1
Update
GregoryComer Jul 23, 2025
89757ce
Update
GregoryComer Jul 23, 2025
423f79a
Update
GregoryComer Jul 23, 2025
69f7f9c
Update
GregoryComer Jul 23, 2025
c0f6224
Update
GregoryComer Jul 23, 2025
e2ea2a3
Update
GregoryComer Jul 23, 2025
7a2fab5
Update
GregoryComer Jul 23, 2025
033c231
Update
GregoryComer Jul 23, 2025
a9ed762
Update
GregoryComer Jul 23, 2025
64b174a
Update
GregoryComer Jul 23, 2025
3976629
Update
GregoryComer Jul 23, 2025
27cd171
Update
GregoryComer Jul 23, 2025
7bdd3e5
Update
GregoryComer Jul 23, 2025
b1254cd
Update
GregoryComer Jul 23, 2025
f2e2289
Update
GregoryComer Jul 23, 2025
cdd15c1
Update
GregoryComer Jul 23, 2025
e2df06e
Update
GregoryComer Jul 23, 2025
4461bd8
Update
GregoryComer Jul 23, 2025
7e97fd0
Update
GregoryComer Jul 23, 2025
bcb697c
Update
GregoryComer Jul 23, 2025
11a5a02
Update
GregoryComer Jul 24, 2025
244b146
Update
GregoryComer Jul 24, 2025
de21ac2
Update
GregoryComer Jul 24, 2025
fd26fc7
Update
GregoryComer Jul 24, 2025
4ae840d
Update
GregoryComer Jul 24, 2025
710ea49
Update
GregoryComer Jul 24, 2025
32f54b0
Update
GregoryComer Jul 24, 2025
a27d18c
Update
GregoryComer Jul 24, 2025
2eb59fc
Update
GregoryComer Jul 24, 2025
5cc4941
Update
GregoryComer Jul 24, 2025
ef7af5c
Update
GregoryComer Jul 24, 2025
18e89c1
Update
GregoryComer Jul 24, 2025
4719c90
Update
GregoryComer Jul 25, 2025
dd09555
Update
GregoryComer Aug 8, 2025
f1db3a0
Update
GregoryComer Aug 8, 2025
e0700b2
Update
GregoryComer Aug 8, 2025
f260b50
Update
GregoryComer Aug 8, 2025
d62ee60
Update
GregoryComer Aug 8, 2025
b2ab3a5
Update
GregoryComer Aug 8, 2025
c23c3e9
Update
GregoryComer Aug 8, 2025
c99c41a
Update
GregoryComer Aug 9, 2025
bf57d6c
Update
GregoryComer Aug 9, 2025
f261355
Update
GregoryComer Aug 11, 2025
c3a24f9
Update
GregoryComer Aug 11, 2025
1697cbc
Update
GregoryComer Aug 11, 2025
b94b45e
Update
GregoryComer Aug 11, 2025
5740f0a
Update
GregoryComer Aug 11, 2025
ed6840d
Update
GregoryComer Aug 11, 2025
f2a7e1f
Update
GregoryComer Aug 11, 2025
0e162ab
Update
GregoryComer Aug 11, 2025
c6bd56b
Update
GregoryComer Aug 11, 2025
144a8ae
Update
GregoryComer Aug 12, 2025
6f85fc1
Update
GregoryComer Aug 12, 2025
2439022
Update
GregoryComer Aug 12, 2025
bd79ef2
Update
GregoryComer Aug 12, 2025
8932c29
Update
GregoryComer Aug 12, 2025
ea2549c
Update
GregoryComer Aug 12, 2025
ffaa1c3
Update
GregoryComer Aug 12, 2025
bba2fa9
Update
GregoryComer Aug 12, 2025
3a3e026
Update
GregoryComer Aug 12, 2025
78086b4
Update
GregoryComer Aug 12, 2025
f4b0dc2
Update
GregoryComer Aug 12, 2025
5e92884
Update
GregoryComer Aug 12, 2025
aa27776
Update
GregoryComer Aug 12, 2025
54563ee
Update
GregoryComer Aug 12, 2025
7e1a002
Update
GregoryComer Aug 12, 2025
a628d29
Update
GregoryComer Aug 12, 2025
3615d89
Update
GregoryComer Aug 12, 2025
e994bc1
Update
GregoryComer Aug 12, 2025
0aba8e1
Update
GregoryComer Aug 12, 2025
4329bf6
Update
GregoryComer Aug 12, 2025
105aabc
Update
GregoryComer Aug 12, 2025
c1a51ee
Update
GregoryComer Aug 12, 2025
1d34f49
Update
GregoryComer Aug 12, 2025
933fba2
Update
GregoryComer Aug 12, 2025
d468ae4
Update
GregoryComer Aug 12, 2025
acbd480
Update
GregoryComer Aug 12, 2025
e515bf1
Update
GregoryComer Aug 12, 2025
803db00
Update
GregoryComer Aug 12, 2025
ab18089
Update
GregoryComer Aug 12, 2025
1897d4e
Update
GregoryComer Aug 12, 2025
f65d80f
Update
GregoryComer Aug 12, 2025
0d1f097
Update
GregoryComer Aug 12, 2025
f0c2490
Update
GregoryComer Aug 12, 2025
0046b02
Update
GregoryComer Aug 12, 2025
32e1029
Update
GregoryComer Aug 12, 2025
871312a
Update
GregoryComer Aug 12, 2025
53990fe
Update
GregoryComer Aug 12, 2025
567d055
Update
GregoryComer Aug 12, 2025
cd998cf
Update
GregoryComer Aug 12, 2025
2a837ab
Update
GregoryComer Aug 12, 2025
31bc137
Update
GregoryComer Aug 12, 2025
dae5d43
Update
GregoryComer Aug 12, 2025
06b5532
Update
GregoryComer Aug 12, 2025
a343abc
Update
GregoryComer Aug 12, 2025
637b8a2
Update
GregoryComer Aug 12, 2025
7141f6c
Update
GregoryComer Aug 12, 2025
4b43363
Update
GregoryComer Aug 12, 2025
995c4b5
Update
GregoryComer Aug 13, 2025
0fc4475
Update
GregoryComer Aug 13, 2025
a3c2dbe
Update
GregoryComer Aug 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 140 additions & 89 deletions backends/test/suite/reporting.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import csv

from collections import Counter
from dataclasses import dataclass
from dataclasses import dataclass, field
from datetime import timedelta
from enum import IntEnum
from functools import reduce
Expand All @@ -11,6 +11,40 @@
from torch.export import ExportedProgram


# The maximum number of model output tensors to log statistics for. Most model tests will
# only have one output, but some may return more than one tensor. This upper bound is needed
# upfront since the file is written progressively. Any outputs beyond these will not have stats logged.
MAX_LOGGED_MODEL_OUTPUTS = 2


# Field names for the CSV report.
CSV_FIELD_NAMES = [
"Test ID",
"Test Case",
"Flow",
"Params",
"Result",
"Result Detail",
"Delegated",
"Quantize Time (s)",
"Lower Time (s)",
"Delegated Nodes",
"Undelegated Nodes",
"Delegated Ops",
"Undelegated Ops",
"PTE Size (Kb)",
]

for i in range(MAX_LOGGED_MODEL_OUTPUTS):
CSV_FIELD_NAMES.extend(
[
f"Output {i} Error Max",
f"Output {i} Error MAE",
f"Output {i} SNR",
]
)


# Operators that are excluded from the counts returned by count_ops. These are used to
# exclude operatations that are not logically relevant or delegatable to backends.
OP_COUNT_IGNORED_OPS = {
Expand Down Expand Up @@ -58,6 +92,36 @@ def is_non_backend_failure(self):
def is_backend_failure(self):
return not self.is_success() and not self.is_non_backend_failure()

def to_short_str(self):
if self in {TestResult.SUCCESS, TestResult.SUCCESS_UNDELEGATED}:
return "Pass"
elif self == TestResult.SKIPPED:
return "Skip"
else:
return "Fail"

def to_detail_str(self):
if self == TestResult.SUCCESS:
return ""
elif self == TestResult.SUCCESS_UNDELEGATED:
return ""
elif self == TestResult.SKIPPED:
return ""
elif self == TestResult.QUANTIZE_FAIL:
return "Quantization Failed"
elif self == TestResult.LOWER_FAIL:
return "Lowering Failed"
elif self == TestResult.PTE_LOAD_FAIL:
return "PTE Load Failed"
elif self == TestResult.PTE_RUN_FAIL:
return "PTE Run Failed"
elif self == TestResult.OUTPUT_MISMATCH_FAIL:
return "Output Mismatch"
elif self == TestResult.UNKNOWN_FAIL:
return "Unknown Failure"
else:
raise ValueError(f"Invalid TestResult value: {self}.")

def display_name(self):
if self == TestResult.SUCCESS:
return "Success (Delegated)"
Expand Down Expand Up @@ -129,12 +193,23 @@ class TestCaseSummary:
pte_size_bytes: int | None = None
""" The size of the PTE file in bytes. """

def is_delegated(self):
return (
any(v > 0 for v in self.delegated_op_counts.values())
if self.delegated_op_counts
else False
)


@dataclass
class TestSessionState:
test_case_summaries: list[TestCaseSummary]
# True if the CSV header has been written to report__path.
has_written_report_header: bool = False

# The file path to write the detail report to, if enabled.
report_path: str | None = None

def __init__(self):
self.test_case_summaries = []
test_case_summaries: list[TestCaseSummary] = field(default_factory=list)


@dataclass
Expand Down Expand Up @@ -212,11 +287,11 @@ def count_ops(program: dict[str, ExportedProgram] | ExportedProgram) -> Counter:
)


def begin_test_session():
def begin_test_session(report_path: str | None):
global _active_session

assert _active_session is None, "A test session is already active."
_active_session = TestSessionState()
_active_session = TestSessionState(report_path=report_path)


def log_test_summary(summary: TestCaseSummary):
Expand All @@ -225,6 +300,15 @@ def log_test_summary(summary: TestCaseSummary):
if _active_session is not None:
_active_session.test_case_summaries.append(summary)

if _active_session.report_path is not None:
Copy link
Contributor

@digantdesai digantdesai Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can multiple subprocesses write to this simultaneously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not safely. What is the use case? Are you thinking we parallelize tests between processes? That seems nice to have, though I'd be inclined to deal with concurrency issues if/when we add that feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah all popular test frameworks do runs in parallel threads, so perhaps in the future.

file_mode = "a" if _active_session.has_written_report_header else "w"
with open(_active_session.report_path, file_mode) as f:
if not _active_session.has_written_report_header:
write_csv_header(f)
_active_session.has_written_report_header = True

write_csv_row(summary, f)
Copy link
Contributor

@digantdesai digantdesai Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies we crash when we run into some failure? Can we try to catch and fail gracefully? Instead of assuming that we can crash anytime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mainly for memory corruption, which happens in a few cases. It might be possible to catch the SIGSEGV or other native fault, but I don't know what state the process is it, so I'm not sure if it's recoverable. Open to suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this through pybinding? Does it crash the cpython process? or this is in a subprocess? if its the main process then yeah I don't know if we can try..catch but if its a subprocess then maybe.

OK with unblocking you now..



def complete_test_session() -> RunSummary:
global _active_session
Expand All @@ -243,6 +327,13 @@ def _sum_op_counts(counter: Counter | None) -> int | None:
return sum(counter.values()) if counter is not None else None


def _serialize_params(params: dict[str, Any] | None) -> str:
if params is not None:
return str(dict(sorted(params.items())))
else:
return ""


def _serialize_op_counts(counter: Counter | None) -> str:
"""
A utility function to serialize op counts to a string, for the purpose of including
Expand All @@ -254,89 +345,49 @@ def _serialize_op_counts(counter: Counter | None) -> str:
return ""


def generate_csv_report(summary: RunSummary, output: TextIO):
"""Write a run summary report to a file in CSV format."""

field_names = [
"Test ID",
"Test Case",
"Backend",
"Flow",
"Result",
"Quantize Time (s)",
"Lowering Time (s)",
]

# Tests can have custom parameters. We'll want to report them here, so we need
# a list of all unique parameter names.
param_names = reduce(
lambda a, b: a.union(b),
(
set(s.params.keys())
for s in summary.test_case_summaries
if s.params is not None
),
set(),
)
field_names += (s.capitalize() for s in param_names)

# Add tensor error statistic field names for each output index.
max_outputs = max(
len(s.tensor_error_statistics) for s in summary.test_case_summaries
)
for i in range(max_outputs):
field_names.extend(
[
f"Output {i} Error Max",
f"Output {i} Error MAE",
f"Output {i} Error MSD",
f"Output {i} Error L2",
f"Output {i} SQNR",
]
)
field_names.extend(
[
"Delegated Nodes",
"Undelegated Nodes",
"Delegated Ops",
"Undelegated Ops",
"PTE Size (Kb)",
]
)

writer = csv.DictWriter(output, field_names)
def write_csv_header(output: TextIO):
writer = csv.DictWriter(output, CSV_FIELD_NAMES)
writer.writeheader()

for record in summary.test_case_summaries:
row = {
"Test ID": record.name,
"Test Case": record.base_name,
"Backend": record.backend,
"Flow": record.flow,
"Result": record.result.display_name(),
"Quantize Time (s)": (
record.quantize_time.total_seconds() if record.quantize_time else None
),
"Lowering Time (s)": (
record.lower_time.total_seconds() if record.lower_time else None
),
}
if record.params is not None:
row.update({k.capitalize(): v for k, v in record.params.items()})

for output_idx, error_stats in enumerate(record.tensor_error_statistics):
row[f"Output {output_idx} Error Max"] = error_stats.error_max
row[f"Output {output_idx} Error MAE"] = error_stats.error_mae
row[f"Output {output_idx} Error MSD"] = error_stats.error_msd
row[f"Output {output_idx} Error L2"] = error_stats.error_l2_norm
row[f"Output {output_idx} SQNR"] = error_stats.sqnr

row["Delegated Nodes"] = _sum_op_counts(record.delegated_op_counts)
row["Undelegated Nodes"] = _sum_op_counts(record.undelegated_op_counts)
row["Delegated Ops"] = _serialize_op_counts(record.delegated_op_counts)
row["Undelegated Ops"] = _serialize_op_counts(record.undelegated_op_counts)
row["PTE Size (Kb)"] = (
record.pte_size_bytes / 1000.0 if record.pte_size_bytes else ""
)

writer.writerow(row)
def write_csv_row(record: TestCaseSummary, output: TextIO):
writer = csv.DictWriter(output, CSV_FIELD_NAMES)

row = {
"Test ID": record.name,
"Test Case": record.base_name,
"Flow": record.flow,
"Params": _serialize_params(record.params),
"Result": record.result.to_short_str(),
"Result Detail": record.result.to_detail_str(),
"Delegated": "True" if record.is_delegated() else "False",
"Quantize Time (s)": (
f"{record.quantize_time.total_seconds():.3f}"
if record.quantize_time
else None
),
"Lower Time (s)": (
f"{record.lower_time.total_seconds():.3f}" if record.lower_time else None
),
}

for output_idx, error_stats in enumerate(record.tensor_error_statistics):
if output_idx >= MAX_LOGGED_MODEL_OUTPUTS:
print(
f"Model output stats are truncated as model has more than {MAX_LOGGED_MODEL_OUTPUTS} outputs. Consider increasing MAX_LOGGED_MODEL_OUTPUTS."
)
break

row[f"Output {output_idx} Error Max"] = f"{error_stats.error_max:.3f}"
row[f"Output {output_idx} Error MAE"] = f"{error_stats.error_mae:.3f}"
row[f"Output {output_idx} SNR"] = f"{error_stats.sqnr:.3f}"

row["Delegated Nodes"] = _sum_op_counts(record.delegated_op_counts)
row["Undelegated Nodes"] = _sum_op_counts(record.undelegated_op_counts)
row["Delegated Ops"] = _serialize_op_counts(record.delegated_op_counts)
row["Undelegated Ops"] = _serialize_op_counts(record.undelegated_op_counts)
row["PTE Size (Kb)"] = (
f"{record.pte_size_bytes / 1000.0:.3f}" if record.pte_size_bytes else ""
)

writer.writerow(row)
8 changes: 1 addition & 7 deletions backends/test/suite/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
begin_test_session,
complete_test_session,
count_ops,
generate_csv_report,
RunSummary,
TestCaseSummary,
TestResult,
Expand Down Expand Up @@ -248,7 +247,7 @@ def build_test_filter(args: argparse.Namespace) -> TestFilter:
def runner_main():
args = parse_args()

begin_test_session()
begin_test_session(args.report)

if len(args.suite) > 1:
raise NotImplementedError("TODO Support multiple suites.")
Expand All @@ -263,11 +262,6 @@ def runner_main():
summary = complete_test_session()
print_summary(summary)

if args.report is not None:
with open(args.report, "w") as f:
print(f"Writing CSV report to {args.report}.")
generate_csv_report(summary, f)


if __name__ == "__main__":
runner_main()
31 changes: 13 additions & 18 deletions backends/test/suite/tests/test_reporting.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@

from ..reporting import (
count_ops,
generate_csv_report,
RunSummary,
TestCaseSummary,
TestResult,
TestSessionState,
write_csv_header,
write_csv_row,
)

# Test data for simulated test results.
Expand Down Expand Up @@ -69,7 +70,9 @@ def test_csv_report_simple(self):
run_summary = RunSummary.from_session(session_state)

strio = StringIO()
generate_csv_report(run_summary, strio)
write_csv_header(strio)
for case_summary in run_summary.test_case_summaries:
write_csv_row(case_summary, strio)

# Attempt to deserialize and validate the CSV report.
report = DictReader(StringIO(strio.getvalue()))
Expand All @@ -79,38 +82,30 @@ def test_csv_report_simple(self):
# Validate first record: test1, backend1, SUCCESS
self.assertEqual(records[0]["Test ID"], "test1_backend1_flow1")
self.assertEqual(records[0]["Test Case"], "test1")
self.assertEqual(records[0]["Backend"], "backend1")
self.assertEqual(records[0]["Flow"], "flow1")
self.assertEqual(records[0]["Result"], "Success (Delegated)")
self.assertEqual(records[0]["Dtype"], "")
self.assertEqual(records[0]["Use_dynamic_shapes"], "")
self.assertEqual(records[0]["Result"], "Pass")
self.assertEqual(records[0]["Params"], "")

# Validate second record: test1, backend2, LOWER_FAIL
self.assertEqual(records[1]["Test ID"], "test1_backend2_flow1")
self.assertEqual(records[1]["Test Case"], "test1")
self.assertEqual(records[1]["Backend"], "backend2")
self.assertEqual(records[1]["Flow"], "flow1")
self.assertEqual(records[1]["Result"], "Fail (Lowering)")
self.assertEqual(records[1]["Dtype"], "")
self.assertEqual(records[1]["Use_dynamic_shapes"], "")
self.assertEqual(records[1]["Result"], "Fail")
self.assertEqual(records[1]["Params"], "")

# Validate third record: test2, backend1, SUCCESS_UNDELEGATED with dtype param
self.assertEqual(records[2]["Test ID"], "test2_backend1_flow1")
self.assertEqual(records[2]["Test Case"], "test2")
self.assertEqual(records[2]["Backend"], "backend1")
self.assertEqual(records[2]["Flow"], "flow1")
self.assertEqual(records[2]["Result"], "Success (Undelegated)")
self.assertEqual(records[2]["Dtype"], str(torch.float32))
self.assertEqual(records[2]["Use_dynamic_shapes"], "")
self.assertEqual(records[2]["Result"], "Pass")
self.assertEqual(records[2]["Params"], str({"dtype": torch.float32}))

# Validate fourth record: test2, backend2, EXPORT_FAIL with use_dynamic_shapes param
self.assertEqual(records[3]["Test ID"], "test2_backend2_flow1")
self.assertEqual(records[3]["Test Case"], "test2")
self.assertEqual(records[3]["Backend"], "backend2")
self.assertEqual(records[3]["Flow"], "flow1")
self.assertEqual(records[3]["Result"], "Skipped")
self.assertEqual(records[3]["Dtype"], "")
self.assertEqual(records[3]["Use_dynamic_shapes"], "True")
self.assertEqual(records[3]["Result"], "Skip")
self.assertEqual(records[3]["Params"], str({"use_dynamic_shapes": True}))

def test_count_ops(self):
"""
Expand Down
Loading