Summary
Multiple operators handle NaN differently from ONNX Runtime when accessed through the ONNX frontend:
- Relu(NaN) → 0 (ORT: NaN)
- Sign(NaN) → 0 (ORT: NaN)
- ReduceMax/ReduceMin — position-dependent NaN behavior:
ReduceMax([NaN, 1.0]) → 1.0 (ORT: NaN)
ReduceMax([2.0, NaN]) → NaN (ORT: 2.0)
Related: #xxx (bug_019, reduce_max/min NaN CPU vs CUDA at Relax IR level)
Reproduction
import numpy as np
import onnx
from onnx import helper, TensorProto, numpy_helper
import onnxruntime as ort
import tvm
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx
def run_tvm(model, inputs):
model = onnx.shape_inference.infer_shapes(model)
mod = from_onnx(model)
pipeline = tvm.ir.transform.Sequential([relax.transform.LegalizeOps()])
exe = tvm.relax.build(pipeline(mod), target="llvm")
vm = tvm.relax.VirtualMachine(exe, device=tvm.cpu())
tvm_ins = [tvm.runtime.tensor(v, device=tvm.cpu()) for v in inputs]
return vm["main"](*tvm_ins).numpy()
# Relu
x = np.array([np.nan, 1.0, np.nan, -2.0], dtype=np.float32)
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [4])
Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [4])
node = helper.make_node("Relu", ["X"], ["Y"])
graph = helper.make_graph([node], "test", [X], [Y])
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 18)])
sess = ort.InferenceSession(model.SerializeToString())
print("ORT Relu:", sess.run(None, {"X": x})[0]) # [nan 1. nan 0.]
print("TVM Relu:", run_tvm(model, [x])) # [0. 1. 0. 0.]
# ReduceMax
x2 = np.array([[np.nan, 1.0], [2.0, np.nan]], dtype=np.float32)
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [2, 2])
Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, None)
axes_init = numpy_helper.from_array(np.array([1], dtype=np.int64), "axes")
node = helper.make_node("ReduceMax", ["X", "axes"], ["Y"], keepdims=0)
graph = helper.make_graph([node], "test", [X], [Y], initializer=[axes_init])
model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 18)])
sess = ort.InferenceSession(model.SerializeToString())
print("ORT ReduceMax:", sess.run(None, {"X": x2})[0]) # [nan 2.]
print("TVM ReduceMax:", run_tvm(model, [x2])) # [ 1. nan]
Root cause
- Relu: Lowered to
max(x, 0) using fmax semantics — NaN treated as missing value
- Sign: Comparison chain (
x > 0 → 1, x < 0 → -1, else 0) — NaN falls to default 0
- ReduceMax/Min: Left-fold with
fmax/fmin — NaN propagation depends on position in fold order
Note
We acknowledge that the ONNX spec does not normatively require NaN propagation for these operators. However, ONNX Runtime (the reference implementation) propagates NaN consistently, and TVM's behavior causes silent numerical divergence when migrating models between runtimes. We report this as a behavioral inconsistency.
Environment
- TVM: 0.24.dev0, commit 0b0afd8 (2026-04-24)
- Python: 3.11
- OS: Linux
cc @KJlaccHoeUM9l @junrushao
Summary
Multiple operators handle NaN differently from ONNX Runtime when accessed through the ONNX frontend:
ReduceMax([NaN, 1.0]) → 1.0(ORT: NaN)ReduceMax([2.0, NaN]) → NaN(ORT: 2.0)Related: #xxx (bug_019, reduce_max/min NaN CPU vs CUDA at Relax IR level)
Reproduction
Root cause
max(x, 0)usingfmaxsemantics — NaN treated as missing valuex > 0 → 1, x < 0 → -1, else 0) — NaN falls to default 0fmax/fmin— NaN propagation depends on position in fold orderNote
We acknowledge that the ONNX spec does not normatively require NaN propagation for these operators. However, ONNX Runtime (the reference implementation) propagates NaN consistently, and TVM's behavior causes silent numerical divergence when migrating models between runtimes. We report this as a behavioral inconsistency.
Environment
cc @KJlaccHoeUM9l @junrushao