Skip to content

Incorrect CumSum Output since TensorRT 10.8+ #1034

@toothache

Description

@toothache

Description

The output of the CumSum operator may be incorrect when multiple CumSum operations are present in the model. Even though they're configured to operate on different axis, they behave like they're operating on the same axis.

This issue affects TensorRT version 10.8 (where ICumulativeLayer was first introduced) and above.

Environment

TensorRT Version: 10.11.0.33
ONNX-TensorRT Version / Branch: built-in
GPU Type: A10
Nvidia Driver Version: 550.144.06
CUDA Version: 12.9
CUDNN Version: 9.10.2
Operating System + Version: Ubuntu 24.04
Python Version (if applicable): 3.12
TensorFlow + TF2ONNX Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt::25.06-py3

Relevant Files

import onnx


def init_model():
    # Create an onnx model with two CumSum ops which are operates on different axes.
    node1 = onnx.helper.make_node(
        "CumSum",
        inputs=["data", "axis_0"],
        outputs=["output_0"],
    )

    node2 = onnx.helper.make_node(
        "CumSum",
        inputs=["data", "axis_1"],
        outputs=["output_1"],
    )

    graph = onnx.helper.make_graph(
        [node1, node2],
        "CumSumGraph",
        [onnx.helper.make_tensor_value_info("data", onnx.TensorProto.FLOAT, [4, 2])],
        [
            onnx.helper.make_tensor_value_info(
                "output_0", onnx.TensorProto.FLOAT, [4, 2]
            ),
            onnx.helper.make_tensor_value_info(
                "output_1", onnx.TensorProto.FLOAT, [4, 2]
            ),
        ],
        initializer=[
            onnx.helper.make_tensor("axis_0", onnx.TensorProto.INT64, [1], [0]),
            onnx.helper.make_tensor("axis_1", onnx.TensorProto.INT64, [1], [1]),
        ],
    )
    model = onnx.helper.make_model(graph)
    return model


def run_cumsum():
    model = init_model()

    print(f"Running cumsum...")

    import onnxruntime as rt
    import numpy as np

    data = {
        "data": np.ones((4, 2), dtype=np.float32),
    }
    sess_cuda = rt.InferenceSession(
        model.SerializeToString(), providers=["CUDAExecutionProvider"]
    )

    providers = [
        "TensorrtExecutionProvider",
        "CUDAExecutionProvider",
    ]
    sess_trt = rt.InferenceSession(model.SerializeToString(), providers=providers)

    out_cuda = sess_cuda.run(None, data)
    out_trt = sess_trt.run(None, data)

    for i in range(2):
        print(
            f"Outputs {i} are {'equal' if np.array_equal(out_cuda[i], out_trt[i]) else 'not equal'}"
        )
        print("CUDA EP:")
        print(out_cuda[i])

        print("TensorRT EP:")
        print(out_trt[i])


if __name__ == "__main__":
    run_cumsum()

Steps To Reproduce

Run above scripts, which generates the following outputs:

Running cumsum...
Outputs 0 are equal
CUDA EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]
TensorRT EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]
Outputs 1 are not equal
CUDA EP:
[[1. 2.]
[1. 2.]
[1. 2.]
[1. 2.]]
TensorRT EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions