[Performance] Performance regression in ELU operator between v1.20.0 and v1.21.0 (≈220% slowdown, suspected Eigen update)

### Describe the issue

### **Title**

Performance regression in ELU operator between v1.20.0 and v1.21.0 (≈220% slowdown, suspected Eigen update)

---

## **Description**

We observed a **significant performance regression** in the ELU operator between **onnxruntime v1.20.0 and v1.21.0**.

---

## **Operator / Test Case Details**

### **Operator**

- **Type**: Elu
- **Opset Version**: 21

### **Input**

- **Name**: X
- **Shape**: [2, 64, 28, 28] (4D tensor)
- **Data type**: float32
- **Value range**:
    - Min: 0.100
    - Max: 9.999
    - Mean: 5.040

### **Output**

- **Name**: output
- **Shape**: [2, 64, 28, 28] (same as input)
- **Data type**: float32

### **Attributes**

- **alpha**: 1.0 (default value per ONNX specification)
- No additional attributes specified

### **Model Information**

- **IR Version**: 10
- **Opset Version**: 21

---

## **Regression Magnitude**

- Approximately **220% slowdown** in **v1.21.0** compared to **v1.20.0**

---

## **Reproducibility**

- Reproducible across **multiple runs** and **different environments**

---

## **Suspected Cause**

Based on commit history analysis, the regression is **likely related to the Eigen update** introduced in the following commit:

- **Suspected commit**:
    
    7c0c6fbe (Eigen update)
    

While the ELU kernel implementation itself does not appear to have been directly modified, the Eigen update may have affected:

- vectorization behavior,
- math kernel selection, or
- underlying execution paths for float32 ELU computation.

### To reproduce

1. Download the attached **ELU model** and **benchmark script**.
2. Run the benchmark using the following command:

```
python script.py ./elu 1.20.0 1.21.0
```

- ./elu: directory containing the ELU model
- 1.20.0: baseline (good) onnxruntime version
- 1.21.0: regressed (bad) onnxruntime version
1. Compare the reported latency between the two versions.
    
    A significant slowdown (~220%) can be observed in **v1.21.0**.


[reproduce.zip](https://github.com/user-attachments/files/24443238/reproduce.zip)

### Urgency

_No response_

### Platform

Linux

### OS Version

Ubuntu 24.04.3 LTS

### ONNX Runtime Installation

Released Package

### ONNX Runtime Version or Commit ID

7c0c6fbe

### ONNX Runtime API

Python

### Architecture

X64

### Execution Provider

Default CPU

### Execution Provider Library Version

_No response_

### Model File

_No response_

### Is this a quantized model?

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Performance regression in ELU operator between v1.20.0 and v1.21.0 (≈220% slowdown, suspected Eigen update) #26916

Describe the issue

Title

Description

Operator / Test Case Details

Operator

Input

Output

Attributes

Model Information

Regression Magnitude

Reproducibility

Suspected Cause

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance] Performance regression in ELU operator between v1.20.0 and v1.21.0 (≈220% slowdown, suspected Eigen update) #26916

Description

Describe the issue

Title

Description

Operator / Test Case Details

Operator

Input

Output

Attributes

Model Information

Regression Magnitude

Reproducibility

Suspected Cause

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions