-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Describe the issue
Description
We observed performance regressions in the ArgMax and ArgMin operators between ONNXRuntime v1.20.0 and v1.21.0.
Affected Operators
ArgMax
- Opset Version: 13
- Regression: +27.67% slowdown
ArgMin
- Opset Version: 13
- Regression: +30.19% slowdown
Test Case Details
Test Case 1: ArgMax (negative axis, 2D input)
Input:
- Name:
X - Shape:
[8, 128](2D tensor) - Data type: float32
- Axis:
-1(last dimension) - keepdims:
0 - select_last_index:
1
Output:
- Name:
output - Shape:
[8] - Data type: int64
Attributes:
{
"axis": -1,
"keepdims": 0,
"select_last_index": 1
}Performance:
- v1.20.0: 0.003 ms
- v1.21.0: 0.003 ms (slight increase)
- Regression: +27.67%
Test Case 2: ArgMin (axis=1, 4D input)
Input:
- Name:
input - Shape:
[2, 64, 56, 56](4D tensor) - Data type: float32
- Axis:
1 - keepdims:
0 - select_last_index:
1
Output:
- Name:
output - Shape:
[2, 56, 56] - Data type: int64
Attributes:
{
"axis": 1,
"keepdims": 0,
"select_last_index": 1
}Performance:
- v1.20.0: 0.327 ms
- v1.21.0: 0.425 ms
- Regression: +30.19%
Regression Magnitude
- ArgMax: +27.67% slowdown (0.003 ms → 0.003 ms)
- ArgMin: +30.19% slowdown (0.327 ms → 0.425 ms)
Suspected Cause
We observed a significant compiler version change between the two releases:
- v1.20.0:
GCC: (GNU) 12.2.1 20221121 (Red Hat 12.2.1-7) - v1.21.0:
GCC: (GNU) 14.2.1 20240801 (Red Hat 14.2.1-1)
This GCC major version upgrade (12.2 → 14.2) may be contributing to the performance regression in ArgMax/ArgMin operators.
ArgMax and ArgMin operators are particularly sensitive to compiler optimizations because they involve
ArgMax operator is particularly sensitive to compiler optimizations because it involves:
- Sequential comparisons over large data ranges
- Index tracking alongside value comparisons
- Conditional branching based on comparison results
- Memory access patterns that depend on compiler optimization
The select_last_index attribute adds additional branching logic that may interact poorly with certain compiler optimization passes.
To reproduce
To Reproduce
-
Download and unzip test.zip
-
Run benchmark using the provided script:
python script.py ./argmax 1.20.0 1.21.0 # Expected: +27% regression python script.py ./argmin 1.20.0 1.21.0 # Expected: +30% regression
-
Compare the reported latencies between the two versions.
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
v1.20.0, v1.21.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU