Skip to content

[Performance] Performance regression in ArgMax and ArgMin operators between v1.20.0 and v1.21.0 (suspected compiler version change) #26940

@junghyunpark2001

Description

@junghyunpark2001

Describe the issue

Description

We observed performance regressions in the ArgMax and ArgMin operators between ONNXRuntime v1.20.0 and v1.21.0.

Affected Operators

ArgMax

  • Opset Version: 13
  • Regression: +27.67% slowdown

ArgMin

  • Opset Version: 13
  • Regression: +30.19% slowdown

Test Case Details

Test Case 1: ArgMax (negative axis, 2D input)

Input:

  • Name: X
  • Shape: [8, 128] (2D tensor)
  • Data type: float32
  • Axis: -1 (last dimension)
  • keepdims: 0
  • select_last_index: 1

Output:

  • Name: output
  • Shape: [8]
  • Data type: int64

Attributes:

{
  "axis": -1,
  "keepdims": 0,
  "select_last_index": 1
}

Performance:

  • v1.20.0: 0.003 ms
  • v1.21.0: 0.003 ms (slight increase)
  • Regression: +27.67%

Test Case 2: ArgMin (axis=1, 4D input)

Input:

  • Name: input
  • Shape: [2, 64, 56, 56] (4D tensor)
  • Data type: float32
  • Axis: 1
  • keepdims: 0
  • select_last_index: 1

Output:

  • Name: output
  • Shape: [2, 56, 56]
  • Data type: int64

Attributes:

{
  "axis": 1,
  "keepdims": 0,
  "select_last_index": 1
}

Performance:

  • v1.20.0: 0.327 ms
  • v1.21.0: 0.425 ms
  • Regression: +30.19%

Regression Magnitude

  • ArgMax: +27.67% slowdown (0.003 ms → 0.003 ms)
  • ArgMin: +30.19% slowdown (0.327 ms → 0.425 ms)

Suspected Cause

We observed a significant compiler version change between the two releases:

  • v1.20.0: GCC: (GNU) 12.2.1 20221121 (Red Hat 12.2.1-7)
  • v1.21.0: GCC: (GNU) 14.2.1 20240801 (Red Hat 14.2.1-1)

This GCC major version upgrade (12.2 → 14.2) may be contributing to the performance regression in ArgMax/ArgMin operators.

ArgMax and ArgMin operators are particularly sensitive to compiler optimizations because they involve
ArgMax operator is particularly sensitive to compiler optimizations because it involves:

  1. Sequential comparisons over large data ranges
  2. Index tracking alongside value comparisons
  3. Conditional branching based on comparison results
  4. Memory access patterns that depend on compiler optimization

The select_last_index attribute adds additional branching logic that may interact poorly with certain compiler optimization passes.

To reproduce

To Reproduce

  1. Download and unzip test.zip

  2. Run benchmark using the provided script:

    python script.py ./argmax 1.20.0 1.21.0  # Expected: +27% regression
    python script.py ./argmin 1.20.0 1.21.0  # Expected: +30% regression
  3. Compare the reported latencies between the two versions.

test.zip

Urgency

No response

Platform

Linux

OS Version

Ubuntu 24.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

v1.20.0, v1.21.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions