Commit b708f79
authored
[NPU]: update the native KLDivLoss implementation for comparison. (eg.)test_jsd.py (#1032)
## Summary
This PR modifies the NPU test reference for KLDivLoss. Since the native
NPU KLDivLoss operator does not support gradients w.r.t. the target
[#1021 ](#1021) it caused
failures in test_jsd.py (where input and target are swapped when beta !=
0).
To resolve this, I replaced the native operator usage with a custom
implementation using basic math operations. This allows correct gradient
computation for the target and aligns the x1.grad results with the
Triton kernel implementation.
## Testing Done
I tested test_jsd,test_fused_linear_jsd by following method and all
cases passed:
pytest -v test/transformers/test_jsd.py
pytest -v test/transformers/test_fused_linear_jsd.py
Hardware Type: Ascend NPU 910B3
- [x] run `make test` to ensure correctness
- [x] run `make checkstyle` to ensure code style
- [ ] run `make test-convergence` to ensure convergence1 parent 559e9a1 commit b708f79
1 file changed
+29
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
21 | 46 | | |
22 | 47 | | |
23 | 48 | | |
| |||
26 | 51 | | |
27 | 52 | | |
28 | 53 | | |
29 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
30 | 58 | | |
31 | 59 | | |
32 | 60 | | |
| |||
0 commit comments