Commit f9f9fcf
Enable SVE ACLE implementation for tanH Aten op for FP32 dType. (pytorch#143741)
In deep learning models, the tanh (hyperbolic tangent) function is a widely used activation function, primarily in feedforward networks, recurrent neural networks (RNNs), and various other architectures.
Also, the tanh (hyperbolic tangent) function is commonly used in **Physics-Informed Neural Networks (PINNs).** PINNs are a class of machine learning models designed to solve partial differential equations (PDEs) by incorporating the governing physics directly into the loss function, along with data-driven terms.
In PINNs, activation functions like tanh are used in the neural network architecture to enable the model to learn complex mappings between inputs (such as spatial and temporal coordinates) and outputs (such as field variables).
**Operator: tanh()**
**Current Implementation in OSS in ATen Backend:**
**SVE Flow:** Uses SVE sleef when available else std implementation.
**With this PR :**
**SVE Flow:** Uses SVE ACLE implementation. (Faster Implementation)
**Here are the performance improvements.**
**Single core perf numbers:**

**Metric:** CPU time avg time per iteration (In ms)
As you can see with both gcc and clang compilers, we see a significant performance gain with SVE ACLE implementation over current OSS Implementation (Sleef) and also Neon.
**Hardware:** m7g.8xlarge (Graviton 3 Instance)
**Script used in benchmarking:**
```python
import os
#os.environ["ATEN_CPU_CAPABILITY"] = "default"
os.environ["ATEN_CPU_CAPABILITY"] = "sve256"
import torch
import torch.nn as nn
#Set the random seed for reproducibility
torch.manual_seed(1)
#Create a tensor of shape (8521, 50)
x = torch.randn(8521, 50)
for i in range(10):
output = x.tanh()
#Perform the tanh operation 1000 times and profile the performance
print("### CPU tanh")
with torch.autograd.profiler.profile(record_shapes=True) as prof:
for i in range(1000):
output = x.tanh()
#Print the profiling results sorted by self CPU time
print(prof.key_averages().table(sort_by="self_cpu_time_total"))
#Optionally print the final output (if needed, uncomment the following line)
print(output)
```
Pull Request resolved: pytorch#143741
Approved by: https://github.com/malfet1 parent fa3b05d commit f9f9fcf
File tree
2 files changed
+90
-1
lines changed- aten/src/ATen
- cpu/vec/sve
- test
2 files changed
+90
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
88 | 140 | | |
89 | 141 | | |
90 | 142 | | |
| |||
333 | 385 | | |
334 | 386 | | |
335 | 387 | | |
| 388 | + | |
336 | 389 | | |
337 | | - | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
338 | 416 | | |
339 | 417 | | |
340 | 418 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
374 | 384 | | |
375 | 385 | | |
376 | 386 | | |
377 | 387 | | |
378 | 388 | | |
| 389 | + | |
379 | 390 | | |
380 | 391 | | |
381 | 392 | | |
| |||
0 commit comments