-
Notifications
You must be signed in to change notification settings - Fork 545
Description
Description
The output of the CumSum operator may be incorrect when multiple CumSum operations are present in the model. Even though they're configured to operate on different axis, they behave like they're operating on the same axis.
This issue affects TensorRT version 10.8 (where ICumulativeLayer was first introduced) and above.
Environment
TensorRT Version: 10.11.0.33
ONNX-TensorRT Version / Branch: built-in
GPU Type: A10
Nvidia Driver Version: 550.144.06
CUDA Version: 12.9
CUDNN Version: 9.10.2
Operating System + Version: Ubuntu 24.04
Python Version (if applicable): 3.12
TensorFlow + TF2ONNX Version (if applicable): N/A
PyTorch Version (if applicable): N/A
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorrt::25.06-py3
Relevant Files
import onnx
def init_model():
# Create an onnx model with two CumSum ops which are operates on different axes.
node1 = onnx.helper.make_node(
"CumSum",
inputs=["data", "axis_0"],
outputs=["output_0"],
)
node2 = onnx.helper.make_node(
"CumSum",
inputs=["data", "axis_1"],
outputs=["output_1"],
)
graph = onnx.helper.make_graph(
[node1, node2],
"CumSumGraph",
[onnx.helper.make_tensor_value_info("data", onnx.TensorProto.FLOAT, [4, 2])],
[
onnx.helper.make_tensor_value_info(
"output_0", onnx.TensorProto.FLOAT, [4, 2]
),
onnx.helper.make_tensor_value_info(
"output_1", onnx.TensorProto.FLOAT, [4, 2]
),
],
initializer=[
onnx.helper.make_tensor("axis_0", onnx.TensorProto.INT64, [1], [0]),
onnx.helper.make_tensor("axis_1", onnx.TensorProto.INT64, [1], [1]),
],
)
model = onnx.helper.make_model(graph)
return model
def run_cumsum():
model = init_model()
print(f"Running cumsum...")
import onnxruntime as rt
import numpy as np
data = {
"data": np.ones((4, 2), dtype=np.float32),
}
sess_cuda = rt.InferenceSession(
model.SerializeToString(), providers=["CUDAExecutionProvider"]
)
providers = [
"TensorrtExecutionProvider",
"CUDAExecutionProvider",
]
sess_trt = rt.InferenceSession(model.SerializeToString(), providers=providers)
out_cuda = sess_cuda.run(None, data)
out_trt = sess_trt.run(None, data)
for i in range(2):
print(
f"Outputs {i} are {'equal' if np.array_equal(out_cuda[i], out_trt[i]) else 'not equal'}"
)
print("CUDA EP:")
print(out_cuda[i])
print("TensorRT EP:")
print(out_trt[i])
if __name__ == "__main__":
run_cumsum()
Steps To Reproduce
Run above scripts, which generates the following outputs:
Running cumsum...
Outputs 0 are equal
CUDA EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]
TensorRT EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]
Outputs 1 are not equal
CUDA EP:
[[1. 2.]
[1. 2.]
[1. 2.]
[1. 2.]]
TensorRT EP:
[[1. 1.]
[2. 2.]
[3. 3.]
[4. 4.]]