Incorrect CumSum Output since TensorRT 10.8+

## Description


The output of the CumSum operator may be incorrect when multiple CumSum operations are present in the model. Even though they're configured to operate on different axis, they behave like they're operating on the same axis.

This issue affects TensorRT version 10.8 (where ICumulativeLayer was first introduced) and above.

## Environment
**TensorRT Version**: 10.11.0.33
**ONNX-TensorRT Version / Branch**: built-in
**GPU Type**: A10
**Nvidia Driver Version**: 550.144.06 
**CUDA Version**: 12.9
**CUDNN Version**: 9.10.2
**Operating System + Version**: Ubuntu 24.04
**Python Version (if applicable)**: 3.12
**TensorFlow + TF2ONNX Version (if applicable)**: N/A
**PyTorch Version (if applicable)**: N/A
**Baremetal or Container (if container which image + tag)**: nvcr.io/nvidia/tensorrt::25.06-py3


## Relevant Files



```:python
import onnx


def init_model():
    # Create an onnx model with two CumSum ops which are operates on different axes.
    node1 = onnx.helper.make_node(
        "CumSum",
        inputs=["data", "axis_0"],
        outputs=["output_0"],
    )

    node2 = onnx.helper.make_node(
        "CumSum",
        inputs=["data", "axis_1"],
        outputs=["output_1"],
    )

    graph = onnx.helper.make_graph(
        [node1, node2],
        "CumSumGraph",
        [onnx.helper.make_tensor_value_info("data", onnx.TensorProto.FLOAT, [4, 2])],
        [
            onnx.helper.make_tensor_value_info(
                "output_0", onnx.TensorProto.FLOAT, [4, 2]
            ),
            onnx.helper.make_tensor_value_info(
                "output_1", onnx.TensorProto.FLOAT, [4, 2]
            ),
        ],
        initializer=[
            onnx.helper.make_tensor("axis_0", onnx.TensorProto.INT64, [1], [0]),
            onnx.helper.make_tensor("axis_1", onnx.TensorProto.INT64, [1], [1]),
        ],
    )
    model = onnx.helper.make_model(graph)
    return model


def run_cumsum():
    model = init_model()

    print(f"Running cumsum...")

    import onnxruntime as rt
    import numpy as np

    data = {
        "data": np.ones((4, 2), dtype=np.float32),
    }
    sess_cuda = rt.InferenceSession(
        model.SerializeToString(), providers=["CUDAExecutionProvider"]
    )

    providers = [
        "TensorrtExecutionProvider",
        "CUDAExecutionProvider",
    ]
    sess_trt = rt.InferenceSession(model.SerializeToString(), providers=providers)

    out_cuda = sess_cuda.run(None, data)
    out_trt = sess_trt.run(None, data)

    for i in range(2):
        print(
            f"Outputs {i} are {'equal' if np.array_equal(out_cuda[i], out_trt[i]) else 'not equal'}"
        )
        print("CUDA EP:")
        print(out_cuda[i])

        print("TensorRT EP:")
        print(out_trt[i])


if __name__ == "__main__":
    run_cumsum()
```

## Steps To Reproduce



Run above scripts, which generates the following outputs:
> Running cumsum...
Outputs 0 are equal
CUDA EP:
[[1. 1.]
 [2. 2.]
 [3. 3.]
 [4. 4.]]
TensorRT EP:
[[1. 1.]
 [2. 2.]
 [3. 3.]
 [4. 4.]]
**Outputs 1 are not equal**
CUDA EP:
[[1. 2.]
 [1. 2.]
 [1. 2.]
 [1. 2.]]
TensorRT EP:
[[1. 1.]
 [2. 2.]
 [3. 3.]
 [4. 4.]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect CumSum Output since TensorRT 10.8+ #1034

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect CumSum Output since TensorRT 10.8+ #1034

Description

Description

Environment

Relevant Files

Steps To Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions