Skip to content

How to determine if the difference between the outputs from different frameworks is acceptable? #4507

@04633435

Description

@04633435

Hi team, I did the outputs comparison over my onnx and engine via polygraphy, here is the result I got,

[I]     Comparing Output: 'image_embed' (dtype=float32, shape=(1, 256, 64, 64)) with 'image_embed' (dtype=float32, shape=(1, 256, 64, 64))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         trt-runner-N0-07/02/25-14:51:06: image_embed | Stats: mean=0.017427, std-dev=0.41016, var=0.16823, median=0.018033, min=-2.275 at (0, 175, 31, 46), max=2.2183 at (0, 78, 59, 0), avg-magnitude=0.31484, p90=0.51308, p95=0.65992, p99=1.0627
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-2.27  , -1.83  ) |       1036 | 
                (-1.83  , -1.38  ) |       2521 | 
                (-1.38  , -0.927 ) |      10504 | 
                (-0.927 , -0.478 ) |      90076 | ########
                (-0.478 , -0.0283) |     368189 | ##################################
                (-0.0283, 0.421  ) |     421553 | ########################################
                (0.421  , 0.87   ) |     135590 | ############
                (0.87   , 1.32   ) |      14542 | #
                (1.32   , 1.77   ) |       4208 | 
                (1.77   , 2.22   ) |        357 | 
[I]         onnxrt-runner-N0-07/02/25-14:51:06: image_embed | Stats: mean=0.017427, std-dev=0.41016, var=0.16823, median=0.018033, min=-2.275 at (0, 175, 31, 46), max=2.2183 at (0, 78, 59, 0), avg-magnitude=0.31484, p90=0.51308, p95=0.65992, p99=1.0627
[I]             ---- Histogram ----
                Bin Range          |  Num Elems | Visualization
                (-2.27  , -1.83  ) |       1036 | 
                (-1.83  , -1.38  ) |       2521 | 
                (-1.38  , -0.927 ) |      10504 | 
                (-0.927 , -0.478 ) |      90077 | ########
                (-0.478 , -0.0283) |     368189 | ##################################
                (-0.0283, 0.421  ) |     421552 | ########################################
                (0.421  , 0.87   ) |     135590 | ############
                (0.87   , 1.32   ) |      14542 | #
                (1.32   , 1.77   ) |       4208 | 
                (1.77   , 2.22   ) |        357 | 
[I]         Error Metrics: image_embed
[I]             Minimum Required Tolerance: elemwise error | [abs=4.4644e-05] OR [rel=0.82618] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=1.1709e-06, std-dev=1.1147e-06, var=1.2425e-12, median=8.9407e-07, min=0 at (0, 0, 0, 0), max=4.4644e-05 at (0, 204, 38, 61), avg-magnitude=1.1709e-06, p90=2.481e-06, p95=3.1292e-06, p99=4.8578e-06
[I]                 ---- Histogram ----
                    Bin Range            |  Num Elems | Visualization
                    (0       , 4.46e-06) |    1033928 | ########################################
                    (4.46e-06, 8.93e-06) |      13333 | 
                    (8.93e-06, 1.34e-05) |        868 | 
                    (1.34e-05, 1.79e-05) |        260 | 
                    (1.79e-05, 2.23e-05) |        126 | 
                    (2.23e-05, 2.68e-05) |         38 | 
                    (2.68e-05, 3.13e-05) |         17 | 
                    (3.13e-05, 3.57e-05) |          5 | 
                    (3.57e-05, 4.02e-05) |          0 | 
                    (4.02e-05, 4.46e-05) |          1 | 
[I]             Relative Difference | Stats: mean=3.0263e-05, std-dev=0.0014917, var=2.2251e-06, median=3.5821e-06, min=0 at (0, 0, 0, 0), max=0.82618 at (0, 42, 8, 21), avg-magnitude=3.0263e-05, p90=2.3614e-05, p95=4.7767e-05, p99=0.00023932
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0     , 0.0826) |    1048549 | ########################################
                    (0.0826, 0.165 ) |         15 | 
                    (0.165 , 0.248 ) |          5 | 
                    (0.248 , 0.33  ) |          2 | 
                    (0.33  , 0.413 ) |          1 | 
                    (0.413 , 0.496 ) |          3 | 
                    (0.496 , 0.578 ) |          0 | 
                    (0.578 , 0.661 ) |          0 | 
                    (0.661 , 0.744 ) |          0 | 
                    (0.744 , 0.826 ) |          1 | 
[E]         FAILED | Output: 'image_embed' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['image_embed']
[E] Accuracy Summary | trt-runner-N0-07/02/25-14:51:06 vs. onnxrt-runner-N0-07/02/25-14:51:06 | Passed: 0/1 iterations | Pass Rate: 0.0%

I know there would be a difference between the outputs of the engine and other frameworks, but to what extent we think the difference is acceptable, and the built engine is correctly converted from the onnx model.

Any insight would be appreciated. Thanks in advance

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:AccuracyOutput mismatch between TensorRT and other frameworksModule:ONNXIssues relating to ONNX usage and importModule:PolygraphyIssues with Polygraphy

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions