Commit 969d4ab
authored
[NPU] Add fused_linear_cross_entropy operator (#1164)
## Summary
To address the UB overflow issue observed in the benchmark, we
introduced an operator with an NPU-friendly implementation of fused
linear cross entropy. This fused operator relies on several underlying
operations (e.g., large matrix multiplication, softmax, and cross
entropy), so its current benchmark performance is not yet optimal.
Further optimization may be needed.
## Testing Done
Device: Atlas A3
`python -m pytest
./test/transformers/test_fused_linear_cross_entropy.py`
<img width="3270" height="499" alt="image"
src="https://github.com/user-attachments/assets/7f8a63df-f325-43fe-80b9-6268c5f10e29"
/>
- Hardware Type: <BLANK>
- [ ] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence1 parent 1c013e2 commit 969d4ab
File tree
2 files changed
+414
-0
lines changed- src/liger_kernel/ops/backends/_ascend/ops
2 files changed
+414
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
31 | 34 | | |
| |||
140 | 143 | | |
141 | 144 | | |
142 | 145 | | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
143 | 149 | | |
0 commit comments