fix a few things

xadupre · xadupre · commit b4916a876dea · 2025-11-22T13:29:52.000+01:00
diff --git a/CHANGELOGS.rst b/CHANGELOGS.rst
@@ -8,7 +8,7 @@ Change Logs
 * :pr:`311`: use custom and local function to use PackedMultiHeadAttention from onnxruntime
 * :pr:`310`: splits patches into multiple files 
 * :pr:`308`: add option --save_ep to dump the exported program as well as torch input
-* :pr:`304`, :pr:`306`: improves side-by-side comparison, creates command line sbs
+* :pr:`304`, :pr:`306`, :pr:`316`: improves side-by-side comparison, creates command line sbs
 
 0.8.2
 +++++
@@ -112,8 +112,7 @@ Change Logs
 * :pr:`203`: Add option to disable patches for torch in command line validate
 * :pr:`202`: add models DeepseekV3ForCausalLM, Gemma3ForCausalLM, Glm4vMoeForConditionalGeneration
 * :pr:`201`: switch CI to 4.55.4
-* :pr:`200`: fixes patches for 4.55.1+, DynamicCache is no longer registered by default,
-  this code moved to executorch.py in transformers
+* :pr:`200`: fixes patches for 4.55.1+, DynamicCache is no longer registered by default, this code moved to executorch.py in transformers
 * :pr:`199`: delete hidden_size and num_attention_heads modification in a config
 * :pr:`198`: support gpt-oss
 * :pr:`197`: updates CI for torch 2.8
@@ -124,15 +123,13 @@ Change Logs
 
 * :pr:`193`: validates with 4.53.3 
 * :pr:`189`: support for task mask-generation
-* :pr:`192`: add support for Gemma-3, add serialization for HybridCache,
-  changes to support ``transformers>=4.54``
+* :pr:`192`: add support for Gemma-3, add serialization for HybridCache, changes to support ``transformers>=4.54``
 
 0.7.5
 +++++
 
 * :pr:`186`: add parameter --output_names to command line validate to change the output names of the onnx exported model
-* :pr:`185`: remove the use of _seen_tokens in DynamicCache (removed in transformers>4.53),
-  updates dummpy inputs for feature-extraction
+* :pr:`185`: remove the use of _seen_tokens in DynamicCache (removed in ``transformers>4.53``), updates dummpy inputs for feature-extraction
 * :pr:`184`: implements side-by-side
 
 0.7.4
@@ -172,12 +169,8 @@ Change Logs
 * :pr:`147`: simplified log processing
 * :pr:`146`: patch for IdeficsAttention, IdeficsEmbedding
 * :pr:`145`: patch for _compute_dynamic_ntk_parameters (Phi3RotaryEmbedding)
-* :pr:`144`: support for second inputs with different dimension,
-  rename test_helper into validate,
-  support ``interpolate_pos_encoding`` for ``VitModel``,
-  update model builder helpers for this PR
-  `Use ONNX IR for model builder
-  <https://github.com/microsoft/onnxruntime-genai/pull/1416>`_
+* :pr:`144`: support for second inputs with different dimension, rename test_helper into validate, support ``interpolate_pos_encoding`` for ``VitModel``, update model builder helpers for this PR
+  `Use ONNX IR for model builder <https://github.com/microsoft/onnxruntime-genai/pull/1416>`_
 * :pr:`143`: compares intermediate results,
 
 0.6.3
@@ -199,8 +192,7 @@ Change Logs
 * :pr:`123`: add subgraphs to TorchOnnxEvaluator
 * :pr:`122`: add local functions to TorchOnnxEvaluator
 * :pr:`120`: enables TorchOnnxEvaluator in command line ``python -m onnx_diagnostic validate ...``
-* :pr:`115`, :pr:`116`, :pr:`117`, :pr:`118`, :pr:`119`, :pr:`127`:
-  first steps for TorchOnnxEvaluator
+* :pr:`115`, :pr:`116`, :pr:`117`, :pr:`118`, :pr:`119`, :pr:`127`: first steps for TorchOnnxEvaluator
 * :pr:`114`: extends the list of known rewritings
 * :pr:`113`: fixes a couple of issues with ModelBuilder
 
@@ -257,10 +249,7 @@ Change Logs
 * :pr:`65`: support SlidingWindowCache
 * :pr:`63`: support option ``--trained``
 * :pr:`61`: improves dynamic shapes for EncoderDecoderCache
-* :pr:`58`: add function use_dyn_not_str to replace string by ``torch.export.Dim.DYNAMIC``,
-  use string instead of ``torch.export.Dim.DYNAMIC`` when returning the dynamic shapes
-  for a specific models, it is a valid definition for ``torch.onnx.export``
-  which can reuse the names
+* :pr:`58`: add function use_dyn_not_str to replace string by ``torch.export.Dim.DYNAMIC``, use string instead of ``torch.export.Dim.DYNAMIC`` when returning the dynamic shapes for a specific models, it is a valid definition for ``torch.onnx.export`` which can reuse the names
 * :pr:`55`: add support for text-classification
 * :pr:`54`: add support for fill-mask, refactoring
 * :pr:`52`: add support for zero-shot-image-classification
@@ -274,28 +263,18 @@ Change Logs
 * :pr:`43`: uses custom patches
 * :pr:`38`: uses the registered serialization functions when it is available
 * :pr:`30`, :pr:`31`: adds command to test a model id, validate the export
-* :pr:`29`: adds helpers to measure the memory peak and run benchmark
-  on different processes
-* :pr:`28`: adds command line to print out the configuration for a model id,
-  support image-text-to-text
-* :pr:`26`: creates a folder ``helpers`` to gather all the functions
-  used in many places
-* :pr:`25`: improve patches for DynamicCache
-  (issue with register_pytree_flatten_spec being deprecated)
-* :pr:`24`: dummy inputs for ``text2text-generation``, add new function
-  ``convert_dynamic_axes_into_dynamic_shapes`` to convert dynamic axes
-  into dynamic shapes, add support for ``T5ForConditionalGeneration``
+* :pr:`29`: adds helpers to measure the memory peak and run benchmark on different processes
+* :pr:`28`: adds command line to print out the configuration for a model id, support image-text-to-text
+* :pr:`26`: creates a folder ``helpers`` to gather all the functions used in many places
+* :pr:`25`: improve patches for DynamicCache (issue with register_pytree_flatten_spec being deprecated)
+* :pr:`24`: dummy inputs for ``text2text-generation``, add new function ``convert_dynamic_axes_into_dynamic_shapes`` to convert dynamic axes into dynamic shapes, add support for ``T5ForConditionalGeneration``
 * :pr:`23`: dummy inputs for ``image-classification``
-* :pr:`22`, :pr:`27`: api to create untrained model copying the architecture
-  of the trained models and dummy inputs for them,
-  support for ``text-generation``
+* :pr:`22`, :pr:`27`: api to create untrained model copying the architecture of the trained models and dummy inputs for them, support for ``text-generation``
 
 0.2.1
 +++++
 
-* :pr:`16`: refactors patches, add model Phi2, implements
-  a tweak to raise an exception with a dynamic dimension
-  becomes static when exporting a model
+* :pr:`16`: refactors patches, add model Phi2, implements a tweak to raise an exception with a dynamic dimension becomes static when exporting a model
 
 0.2.0
 +++++
diff --git a/_doc/cmds/index.rst b/_doc/cmds/index.rst
@@ -9,4 +9,5 @@ Command Lines
     :maxdepth: 1
 
     config
+    sbs
     validate
diff --git a/_doc/cmds/sbs.rst b/_doc/cmds/sbs.rst
@@ -0,0 +1,22 @@
+-m onnx_diagnostic sbs ... runs a side-by-side torch/onnx
+=========================================================
+
+Description
++++++++++++
+
+It compares the intermediate results between an exported programm saved with
+:func:`torch.export.save` and an exported model on saved inputs
+with :func:`torch.save`. It assumes intermediate results share the same
+names.
+
+.. runpython::
+
+    from onnx_diagnostic._command_lines_parser import get_parser_sbs
+
+    get_parser_sbs().print_help()
+
+CPU, CUDA
++++++++++
+
+Inputs are saved :func:`torch.save`. The execution will run on CUDA
+if the device of the inputs is CUDA, same goes on CPU.
diff --git a/_doc/technical/plot_matmul_reverse_engineering.py b/_doc/technical/plot_matmul_reverse_engineering.py
@@ -0,0 +1,110 @@
+"""
+.. _l-plot-matmul-reverse-engineering:
+
+=================
+More about Linear
+=================
+
+"""
+
+import cpuinfo
+import pandas
+import onnx
+import onnx.helper as oh
+from tqdm import tqdm
+import torch
+from onnx_diagnostic.ext_test_case import unit_test_going
+from onnx_diagnostic.helpers import max_diff
+from onnx_diagnostic.reference import OnnxruntimeEvaluator
+from onnxruntime import __version__ as version_onnxruntime
+
+print(f"onnxruntime version = {version_onnxruntime}")
+print(f"cpu name = {cpuinfo.get_cpu_info()['brand_raw']}")
+if torch.cuda.is_available():
+    print(f"gpu name = {torch.cuda.get_device_name(0)}")
+    print(f"cuda version = {torch.version.cuda}")
+
+# %%
+# The version is important. Numerical differences are observed
+# with onnxruntime<=1.22. Let's see how to make them happen.
+
+
+def make_model_gemm(itype: int) -> onnx.ModelProto:
+    return oh.make_model(
+        oh.make_graph(
+            [oh.make_node("Gemm", ["A", "X", "B"], ["Y"])],
+            "test",
+            [
+                oh.make_tensor_value_info("A", itype, ["a", "b"]),
+                oh.make_tensor_value_info("X", itype, ["b", "c"]),
+                oh.make_tensor_value_info("B", itype, ["c"]),
+            ],
+            [oh.make_tensor_value_info("Y", itype, ["a", "c"])],
+        ),
+        opset_imports=[oh.make_opsetid("", 22)],
+        ir_version=10,
+    )
+
+
+def make_grid(N, bucket):
+    a = torch.ones((N, N), dtype=torch.float32)
+    n = N // bucket + (1 if N % bucket else 0)
+    b = torch.ones((N,), dtype=torch.float32)
+    mp = 8
+    for i in range(n):
+        for j in range(n):
+            p = (i + j) % mp + 2
+            val = float(2**p) * 0.1234
+            a[
+                i * bucket : min((i + 1) * bucket, N),
+                (n - j - 2) * bucket : min((n - j - 1) * bucket, N),
+            ] = val
+        val = float(2 ** (i % mp)) + 0.1234
+        b[i * bucket : min((i + 1) * bucket, N)] = val
+    a -= a.mean()
+    b -= b.mean()
+    a /= a.std()
+    b /= b.std()
+    return a, -a, -b
+
+
+print("N = 8, bucket = 2")
+print(make_grid(8, 2)[0])
+
+# %%
+# We try different grid settings.
+
+if torch.cuda.is_available():
+    itype, dtype, device = onnx.TensorProto.FLOAT16, torch.float16, "cuda"
+    data = []
+    bar = tqdm(list(range(20, 1200, 100 if unit_test_going() else 1)))
+    for i in bar:
+        A, X, B = make_grid(1280, i)
+        a = A.to(dtype).to(device)
+        x = X.to(dtype).to(device)
+        b = B.to(dtype).to(device)
+        feeds = dict(A=a, X=x, B=b)
+        model = make_model_gemm(itype)
+        expected = torch.nn.functional.linear(a, x.T, b)
+        sess = OnnxruntimeEvaluator(model, whole=True)
+        results = sess.run(None, feeds)
+        diff = max_diff(expected, results[0], hist=[0.1, 1.0])
+        e32 = expected.to(torch.double)
+        bar.set_description(f"err={diff['abs']:1.3f}")
+        data.append(
+            dict(
+                M=A.shape[0],
+                N=X.shape[1],
+                K=A.shape[1],
+                B=i,
+                err=diff["abs"],
+                nerr1=diff["rep"][">0.1"],
+                mean=expected.to(torch.float32).mean().item(),
+            )
+        )
+
+    df = pandas.DataFrame(data)
+    print(df.tail())
+    df[df["err"] > 0].to_excel("plot_matmul_reverse_engineering.cuda.xlsx")
+    ax = df[["B", "err"]].set_index("B").plot(title="ERR / regularity size")
+    ax.figure.savefig("plot_matmul_reverse_engineering.cuda.png")
diff --git a/_unittests/ut_xrun_doc/test_command_lines_exe.py b/_unittests/ut_xrun_doc/test_command_lines_exe.py
@@ -2,10 +2,12 @@
 import unittest
 from contextlib import redirect_stdout
 from io import StringIO
+import pandas
 import torch
 from onnx_diagnostic.ext_test_case import ExtTestCase, ignore_warnings
 from onnx_diagnostic._command_lines_parser import main
 from onnx_diagnostic.helpers.log_helper import enumerate_csv_files
+from onnx_diagnostic.export.api import to_onnx
 
 
 class TestCommandLines(ExtTestCase):
@@ -88,6 +90,71 @@ def test_g_parser_agg(self):
         self.assertIn("[CubeLogs.to_excel] plots 1 plots", text)
         self.assertExists(output)
 
+    @ignore_warnings(UserWarning)
+    def test_h_parser_sbs(self):
+        import torch
+
+        class Model(torch.nn.Module):
+            def __init__(self):
+                super(Model, self).__init__()
+                self.fc1 = torch.nn.Linear(10, 32)  # input size 10 → hidden size 32
+                self.relu = torch.nn.ReLU()
+                self.fc2 = torch.nn.Linear(32, 1)  # hidden → output
+
+            def forward(self, x):
+                x = self.relu(self.fc1(x))
+                x = self.fc2(x)
+                return x
+
+        inputs = dict(x=torch.randn((5, 10)))
+        ds = dict(x={0: "batch"})
+        input_file = self.get_dump_file("test_h_parser_sbs.inputs.pt")
+        ep_file = self.get_dump_file("test_h_parser_sbs.ep")
+        onnx_file = self.get_dump_file("test_h_parser_sbs.model.onnx")
+        torch.save(inputs, input_file)
+        to_onnx(
+            Model(),
+            kwargs=inputs,
+            dynamic_shapes=ds,
+            exporter="custom",
+            save_ep=(ep_file, 2**30),
+            filename=onnx_file,
+        )
+
+        output = self.get_dump_file("test_h_parser_sbs.xlsx")
+        st = StringIO()
+        with redirect_stdout(st):
+            main(
+                [
+                    "sbs",
+                    "-v",
+                    "1",
+                    "--first",
+                    "-i",
+                    input_file,
+                    "-e",
+                    f"{ep_file}.ep.pt2",
+                    "-o",
+                    output,
+                    "-m",
+                    onnx_file,
+                ]
+            )
+        text = st.getvalue()
+        self.assertIn("[run_aligned", text)
+        self.assertExists(output)
+        df = pandas.read_excel(output).apply(
+            lambda col: col.fillna("") if col.dtype == "object" else col
+        )
+        self.assertLess(df["err_abs"].max(), 1e-5)
+        self.assertEqual(df["err_h01"].max(), 0)
+        self.assertIn("p_fc1_weight", set(df["ep_name"]))
+        self.assertIn("fc1.bias", set(df["onnx_name"]))
+        self.assertNotIn("NaN", set(df["ep_name"]))
+        print(df)
+        print(st.getvalue())
+        self.assertIn("[run_aligned] done", st.getvalue())
+
 
 if __name__ == "__main__":
     unittest.main(verbosity=2)
diff --git a/clean_onnx.sh b/clean_onnx.sh
@@ -30,7 +30,8 @@ rm _plot_torch_sklearn_201_knnpy.py
 
 rm _doc/sg_execution_times.rst
 
-rm _doc/examples/plot*.onnx
+rm _doc/examples/_debug*
+rm _doc/examples/plot*.onnx*
 rm _doc/examples/plot*.txt
 rm _doc/examples/ort*.onnx
 rm _doc/examples/*.sarif
@@ -83,6 +84,7 @@ rm _doc/technical/*.dynamo.onnx
 rm _doc/technical/*.script.onnx
 rm _doc/technical/dump_models -rf
 rm _doc/technical/dump_onx_*
+rm _doc/technical/model_*.onnx* -rf
 
 rm _tools/bin -rf
 rm _tools/mambaroot -rf
diff --git a/onnx_diagnostic/_command_lines_parser.py b/onnx_diagnostic/_command_lines_parser.py
@@ -1151,6 +1151,7 @@ def get_parser_sbs() -> ArgumentParser:
         help="model inputs saved with torch.save",
     )
     parser.add_argument(
+        "-e",
         "--ep",
         type=str,
         required=True,
@@ -1322,7 +1323,7 @@ def _size(name):
     df = pandas.DataFrame(data).apply(
         lambda col: col.fillna("") if col.dtype == "object" else col
     )
-    df.to_excel(args.output)
+    df.to_excel(args.output, index=False)
     print("-- done")
 
 
diff --git a/onnx_diagnostic/torch_onnx/sbs.py b/onnx_diagnostic/torch_onnx/sbs.py

Original file line number	Diff line number	Diff line change
`@@ -1151,6 +1151,7 @@ def get_parser_sbs() -> ArgumentParser:`
`1151`	`1151`	`help="model inputs saved with torch.save",`
`1152`	`1152`	`)`
`1153`	`1153`	`parser.add_argument(`
	`1154`	`+ "-e",`
`1154`	`1155`	`"--ep",`
`1155`	`1156`	`type=str,`
`1156`	`1157`	`required=True,`
`@@ -1322,7 +1323,7 @@ def _size(name):`
`1322`	`1323`	`df = pandas.DataFrame(data).apply(`
`1323`	`1324`	`lambda col: col.fillna("") if col.dtype == "object" else col`
`1324`	`1325`	`)`
`1325`		`- df.to_excel(args.output)`
	`1326`	`+ df.to_excel(args.output, index=False)`
`1326`	`1327`	`print("-- done")`
`1327`	`1328`
`1328`	`1329`