Skip to content

Commit b4916a8

Browse files
committed
fix a few things
1 parent 49ea518 commit b4916a8

File tree

8 files changed

+398
-202
lines changed

8 files changed

+398
-202
lines changed

CHANGELOGS.rst

Lines changed: 15 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Change Logs
88
* :pr:`311`: use custom and local function to use PackedMultiHeadAttention from onnxruntime
99
* :pr:`310`: splits patches into multiple files
1010
* :pr:`308`: add option --save_ep to dump the exported program as well as torch input
11-
* :pr:`304`, :pr:`306`: improves side-by-side comparison, creates command line sbs
11+
* :pr:`304`, :pr:`306`, :pr:`316`: improves side-by-side comparison, creates command line sbs
1212

1313
0.8.2
1414
+++++
@@ -112,8 +112,7 @@ Change Logs
112112
* :pr:`203`: Add option to disable patches for torch in command line validate
113113
* :pr:`202`: add models DeepseekV3ForCausalLM, Gemma3ForCausalLM, Glm4vMoeForConditionalGeneration
114114
* :pr:`201`: switch CI to 4.55.4
115-
* :pr:`200`: fixes patches for 4.55.1+, DynamicCache is no longer registered by default,
116-
this code moved to executorch.py in transformers
115+
* :pr:`200`: fixes patches for 4.55.1+, DynamicCache is no longer registered by default, this code moved to executorch.py in transformers
117116
* :pr:`199`: delete hidden_size and num_attention_heads modification in a config
118117
* :pr:`198`: support gpt-oss
119118
* :pr:`197`: updates CI for torch 2.8
@@ -124,15 +123,13 @@ Change Logs
124123

125124
* :pr:`193`: validates with 4.53.3
126125
* :pr:`189`: support for task mask-generation
127-
* :pr:`192`: add support for Gemma-3, add serialization for HybridCache,
128-
changes to support ``transformers>=4.54``
126+
* :pr:`192`: add support for Gemma-3, add serialization for HybridCache, changes to support ``transformers>=4.54``
129127

130128
0.7.5
131129
+++++
132130

133131
* :pr:`186`: add parameter --output_names to command line validate to change the output names of the onnx exported model
134-
* :pr:`185`: remove the use of _seen_tokens in DynamicCache (removed in transformers>4.53),
135-
updates dummpy inputs for feature-extraction
132+
* :pr:`185`: remove the use of _seen_tokens in DynamicCache (removed in ``transformers>4.53``), updates dummpy inputs for feature-extraction
136133
* :pr:`184`: implements side-by-side
137134

138135
0.7.4
@@ -172,12 +169,8 @@ Change Logs
172169
* :pr:`147`: simplified log processing
173170
* :pr:`146`: patch for IdeficsAttention, IdeficsEmbedding
174171
* :pr:`145`: patch for _compute_dynamic_ntk_parameters (Phi3RotaryEmbedding)
175-
* :pr:`144`: support for second inputs with different dimension,
176-
rename test_helper into validate,
177-
support ``interpolate_pos_encoding`` for ``VitModel``,
178-
update model builder helpers for this PR
179-
`Use ONNX IR for model builder
180-
<https://github.com/microsoft/onnxruntime-genai/pull/1416>`_
172+
* :pr:`144`: support for second inputs with different dimension, rename test_helper into validate, support ``interpolate_pos_encoding`` for ``VitModel``, update model builder helpers for this PR
173+
`Use ONNX IR for model builder <https://github.com/microsoft/onnxruntime-genai/pull/1416>`_
181174
* :pr:`143`: compares intermediate results,
182175

183176
0.6.3
@@ -199,8 +192,7 @@ Change Logs
199192
* :pr:`123`: add subgraphs to TorchOnnxEvaluator
200193
* :pr:`122`: add local functions to TorchOnnxEvaluator
201194
* :pr:`120`: enables TorchOnnxEvaluator in command line ``python -m onnx_diagnostic validate ...``
202-
* :pr:`115`, :pr:`116`, :pr:`117`, :pr:`118`, :pr:`119`, :pr:`127`:
203-
first steps for TorchOnnxEvaluator
195+
* :pr:`115`, :pr:`116`, :pr:`117`, :pr:`118`, :pr:`119`, :pr:`127`: first steps for TorchOnnxEvaluator
204196
* :pr:`114`: extends the list of known rewritings
205197
* :pr:`113`: fixes a couple of issues with ModelBuilder
206198

@@ -257,10 +249,7 @@ Change Logs
257249
* :pr:`65`: support SlidingWindowCache
258250
* :pr:`63`: support option ``--trained``
259251
* :pr:`61`: improves dynamic shapes for EncoderDecoderCache
260-
* :pr:`58`: add function use_dyn_not_str to replace string by ``torch.export.Dim.DYNAMIC``,
261-
use string instead of ``torch.export.Dim.DYNAMIC`` when returning the dynamic shapes
262-
for a specific models, it is a valid definition for ``torch.onnx.export``
263-
which can reuse the names
252+
* :pr:`58`: add function use_dyn_not_str to replace string by ``torch.export.Dim.DYNAMIC``, use string instead of ``torch.export.Dim.DYNAMIC`` when returning the dynamic shapes for a specific models, it is a valid definition for ``torch.onnx.export`` which can reuse the names
264253
* :pr:`55`: add support for text-classification
265254
* :pr:`54`: add support for fill-mask, refactoring
266255
* :pr:`52`: add support for zero-shot-image-classification
@@ -274,28 +263,18 @@ Change Logs
274263
* :pr:`43`: uses custom patches
275264
* :pr:`38`: uses the registered serialization functions when it is available
276265
* :pr:`30`, :pr:`31`: adds command to test a model id, validate the export
277-
* :pr:`29`: adds helpers to measure the memory peak and run benchmark
278-
on different processes
279-
* :pr:`28`: adds command line to print out the configuration for a model id,
280-
support image-text-to-text
281-
* :pr:`26`: creates a folder ``helpers`` to gather all the functions
282-
used in many places
283-
* :pr:`25`: improve patches for DynamicCache
284-
(issue with register_pytree_flatten_spec being deprecated)
285-
* :pr:`24`: dummy inputs for ``text2text-generation``, add new function
286-
``convert_dynamic_axes_into_dynamic_shapes`` to convert dynamic axes
287-
into dynamic shapes, add support for ``T5ForConditionalGeneration``
266+
* :pr:`29`: adds helpers to measure the memory peak and run benchmark on different processes
267+
* :pr:`28`: adds command line to print out the configuration for a model id, support image-text-to-text
268+
* :pr:`26`: creates a folder ``helpers`` to gather all the functions used in many places
269+
* :pr:`25`: improve patches for DynamicCache (issue with register_pytree_flatten_spec being deprecated)
270+
* :pr:`24`: dummy inputs for ``text2text-generation``, add new function ``convert_dynamic_axes_into_dynamic_shapes`` to convert dynamic axes into dynamic shapes, add support for ``T5ForConditionalGeneration``
288271
* :pr:`23`: dummy inputs for ``image-classification``
289-
* :pr:`22`, :pr:`27`: api to create untrained model copying the architecture
290-
of the trained models and dummy inputs for them,
291-
support for ``text-generation``
272+
* :pr:`22`, :pr:`27`: api to create untrained model copying the architecture of the trained models and dummy inputs for them, support for ``text-generation``
292273

293274
0.2.1
294275
+++++
295276

296-
* :pr:`16`: refactors patches, add model Phi2, implements
297-
a tweak to raise an exception with a dynamic dimension
298-
becomes static when exporting a model
277+
* :pr:`16`: refactors patches, add model Phi2, implements a tweak to raise an exception with a dynamic dimension becomes static when exporting a model
299278

300279
0.2.0
301280
+++++

_doc/cmds/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@ Command Lines
99
:maxdepth: 1
1010

1111
config
12+
sbs
1213
validate

_doc/cmds/sbs.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
-m onnx_diagnostic sbs ... runs a side-by-side torch/onnx
2+
=========================================================
3+
4+
Description
5+
+++++++++++
6+
7+
It compares the intermediate results between an exported programm saved with
8+
:func:`torch.export.save` and an exported model on saved inputs
9+
with :func:`torch.save`. It assumes intermediate results share the same
10+
names.
11+
12+
.. runpython::
13+
14+
from onnx_diagnostic._command_lines_parser import get_parser_sbs
15+
16+
get_parser_sbs().print_help()
17+
18+
CPU, CUDA
19+
+++++++++
20+
21+
Inputs are saved :func:`torch.save`. The execution will run on CUDA
22+
if the device of the inputs is CUDA, same goes on CPU.
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
"""
2+
.. _l-plot-matmul-reverse-engineering:
3+
4+
=================
5+
More about Linear
6+
=================
7+
8+
"""
9+
10+
import cpuinfo
11+
import pandas
12+
import onnx
13+
import onnx.helper as oh
14+
from tqdm import tqdm
15+
import torch
16+
from onnx_diagnostic.ext_test_case import unit_test_going
17+
from onnx_diagnostic.helpers import max_diff
18+
from onnx_diagnostic.reference import OnnxruntimeEvaluator
19+
from onnxruntime import __version__ as version_onnxruntime
20+
21+
print(f"onnxruntime version = {version_onnxruntime}")
22+
print(f"cpu name = {cpuinfo.get_cpu_info()['brand_raw']}")
23+
if torch.cuda.is_available():
24+
print(f"gpu name = {torch.cuda.get_device_name(0)}")
25+
print(f"cuda version = {torch.version.cuda}")
26+
27+
# %%
28+
# The version is important. Numerical differences are observed
29+
# with onnxruntime<=1.22. Let's see how to make them happen.
30+
31+
32+
def make_model_gemm(itype: int) -> onnx.ModelProto:
33+
return oh.make_model(
34+
oh.make_graph(
35+
[oh.make_node("Gemm", ["A", "X", "B"], ["Y"])],
36+
"test",
37+
[
38+
oh.make_tensor_value_info("A", itype, ["a", "b"]),
39+
oh.make_tensor_value_info("X", itype, ["b", "c"]),
40+
oh.make_tensor_value_info("B", itype, ["c"]),
41+
],
42+
[oh.make_tensor_value_info("Y", itype, ["a", "c"])],
43+
),
44+
opset_imports=[oh.make_opsetid("", 22)],
45+
ir_version=10,
46+
)
47+
48+
49+
def make_grid(N, bucket):
50+
a = torch.ones((N, N), dtype=torch.float32)
51+
n = N // bucket + (1 if N % bucket else 0)
52+
b = torch.ones((N,), dtype=torch.float32)
53+
mp = 8
54+
for i in range(n):
55+
for j in range(n):
56+
p = (i + j) % mp + 2
57+
val = float(2**p) * 0.1234
58+
a[
59+
i * bucket : min((i + 1) * bucket, N),
60+
(n - j - 2) * bucket : min((n - j - 1) * bucket, N),
61+
] = val
62+
val = float(2 ** (i % mp)) + 0.1234
63+
b[i * bucket : min((i + 1) * bucket, N)] = val
64+
a -= a.mean()
65+
b -= b.mean()
66+
a /= a.std()
67+
b /= b.std()
68+
return a, -a, -b
69+
70+
71+
print("N = 8, bucket = 2")
72+
print(make_grid(8, 2)[0])
73+
74+
# %%
75+
# We try different grid settings.
76+
77+
if torch.cuda.is_available():
78+
itype, dtype, device = onnx.TensorProto.FLOAT16, torch.float16, "cuda"
79+
data = []
80+
bar = tqdm(list(range(20, 1200, 100 if unit_test_going() else 1)))
81+
for i in bar:
82+
A, X, B = make_grid(1280, i)
83+
a = A.to(dtype).to(device)
84+
x = X.to(dtype).to(device)
85+
b = B.to(dtype).to(device)
86+
feeds = dict(A=a, X=x, B=b)
87+
model = make_model_gemm(itype)
88+
expected = torch.nn.functional.linear(a, x.T, b)
89+
sess = OnnxruntimeEvaluator(model, whole=True)
90+
results = sess.run(None, feeds)
91+
diff = max_diff(expected, results[0], hist=[0.1, 1.0])
92+
e32 = expected.to(torch.double)
93+
bar.set_description(f"err={diff['abs']:1.3f}")
94+
data.append(
95+
dict(
96+
M=A.shape[0],
97+
N=X.shape[1],
98+
K=A.shape[1],
99+
B=i,
100+
err=diff["abs"],
101+
nerr1=diff["rep"][">0.1"],
102+
mean=expected.to(torch.float32).mean().item(),
103+
)
104+
)
105+
106+
df = pandas.DataFrame(data)
107+
print(df.tail())
108+
df[df["err"] > 0].to_excel("plot_matmul_reverse_engineering.cuda.xlsx")
109+
ax = df[["B", "err"]].set_index("B").plot(title="ERR / regularity size")
110+
ax.figure.savefig("plot_matmul_reverse_engineering.cuda.png")

_unittests/ut_xrun_doc/test_command_lines_exe.py

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,12 @@
22
import unittest
33
from contextlib import redirect_stdout
44
from io import StringIO
5+
import pandas
56
import torch
67
from onnx_diagnostic.ext_test_case import ExtTestCase, ignore_warnings
78
from onnx_diagnostic._command_lines_parser import main
89
from onnx_diagnostic.helpers.log_helper import enumerate_csv_files
10+
from onnx_diagnostic.export.api import to_onnx
911

1012

1113
class TestCommandLines(ExtTestCase):
@@ -88,6 +90,71 @@ def test_g_parser_agg(self):
8890
self.assertIn("[CubeLogs.to_excel] plots 1 plots", text)
8991
self.assertExists(output)
9092

93+
@ignore_warnings(UserWarning)
94+
def test_h_parser_sbs(self):
95+
import torch
96+
97+
class Model(torch.nn.Module):
98+
def __init__(self):
99+
super(Model, self).__init__()
100+
self.fc1 = torch.nn.Linear(10, 32) # input size 10 → hidden size 32
101+
self.relu = torch.nn.ReLU()
102+
self.fc2 = torch.nn.Linear(32, 1) # hidden → output
103+
104+
def forward(self, x):
105+
x = self.relu(self.fc1(x))
106+
x = self.fc2(x)
107+
return x
108+
109+
inputs = dict(x=torch.randn((5, 10)))
110+
ds = dict(x={0: "batch"})
111+
input_file = self.get_dump_file("test_h_parser_sbs.inputs.pt")
112+
ep_file = self.get_dump_file("test_h_parser_sbs.ep")
113+
onnx_file = self.get_dump_file("test_h_parser_sbs.model.onnx")
114+
torch.save(inputs, input_file)
115+
to_onnx(
116+
Model(),
117+
kwargs=inputs,
118+
dynamic_shapes=ds,
119+
exporter="custom",
120+
save_ep=(ep_file, 2**30),
121+
filename=onnx_file,
122+
)
123+
124+
output = self.get_dump_file("test_h_parser_sbs.xlsx")
125+
st = StringIO()
126+
with redirect_stdout(st):
127+
main(
128+
[
129+
"sbs",
130+
"-v",
131+
"1",
132+
"--first",
133+
"-i",
134+
input_file,
135+
"-e",
136+
f"{ep_file}.ep.pt2",
137+
"-o",
138+
output,
139+
"-m",
140+
onnx_file,
141+
]
142+
)
143+
text = st.getvalue()
144+
self.assertIn("[run_aligned", text)
145+
self.assertExists(output)
146+
df = pandas.read_excel(output).apply(
147+
lambda col: col.fillna("") if col.dtype == "object" else col
148+
)
149+
self.assertLess(df["err_abs"].max(), 1e-5)
150+
self.assertEqual(df["err_h01"].max(), 0)
151+
self.assertIn("p_fc1_weight", set(df["ep_name"]))
152+
self.assertIn("fc1.bias", set(df["onnx_name"]))
153+
self.assertNotIn("NaN", set(df["ep_name"]))
154+
print(df)
155+
print(st.getvalue())
156+
self.assertIn("[run_aligned] done", st.getvalue())
157+
91158

92159
if __name__ == "__main__":
93160
unittest.main(verbosity=2)

clean_onnx.sh

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,8 @@ rm _plot_torch_sklearn_201_knnpy.py
3030

3131
rm _doc/sg_execution_times.rst
3232

33-
rm _doc/examples/plot*.onnx
33+
rm _doc/examples/_debug*
34+
rm _doc/examples/plot*.onnx*
3435
rm _doc/examples/plot*.txt
3536
rm _doc/examples/ort*.onnx
3637
rm _doc/examples/*.sarif
@@ -83,6 +84,7 @@ rm _doc/technical/*.dynamo.onnx
8384
rm _doc/technical/*.script.onnx
8485
rm _doc/technical/dump_models -rf
8586
rm _doc/technical/dump_onx_*
87+
rm _doc/technical/model_*.onnx* -rf
8688

8789
rm _tools/bin -rf
8890
rm _tools/mambaroot -rf

onnx_diagnostic/_command_lines_parser.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1151,6 +1151,7 @@ def get_parser_sbs() -> ArgumentParser:
11511151
help="model inputs saved with torch.save",
11521152
)
11531153
parser.add_argument(
1154+
"-e",
11541155
"--ep",
11551156
type=str,
11561157
required=True,
@@ -1322,7 +1323,7 @@ def _size(name):
13221323
df = pandas.DataFrame(data).apply(
13231324
lambda col: col.fillna("") if col.dtype == "object" else col
13241325
)
1325-
df.to_excel(args.output)
1326+
df.to_excel(args.output, index=False)
13261327
print("-- done")
13271328

13281329

0 commit comments

Comments
 (0)