Commit 89868e2
authored
Do not set block load attribute for transposed A matrices (#2443)
We cannot lower a transposed A matrix to a transposed 2D block load.
Instead, the load is lowered via the LLVM path introduced in #2181 .
There appears to be a performance regression in this path which is
slower than materializing the block in SLM and then reading into
registers and computing the dot product from there. Using the work in
#2420 I am able to drop the block load attribute for this case and go
down the non block ptr path.
Performance on main:
```
Compute A x B
✅ Triton and Torch match
Time for torch: 0.32444801926612854 ms
Time for triton: 0.44371041655540466 ms
Compute A x B.T
✅ Triton and Torch match
Time for torch: 0.32708799839019775 ms
Time for triton: 0.634996771812439 ms
Compute A.T x B
✅ Triton and Torch match
Time for torch: 0.31204161047935486 ms
Time for triton: 3.4140689373016357 ms
Compute A.T x B.T
✅ Triton and Torch match
Time for torch: 0.45701122283935547 ms
Time for triton: 3.7463345527648926 ms
```
Performance on this PR:
```
Compute A x B
✅ Triton and Torch match
Time for torch: 0.3081200122833252 ms
Time for triton: 0.44333598017692566 ms
Compute A x B.T
✅ Triton and Torch match
Time for torch: 0.33799198269844055 ms
Time for triton: 0.6391856074333191 ms
Compute A.T x B
✅ Triton and Torch match
Time for torch: 0.31700319051742554 ms
Time for triton: 1.5733630657196045 ms
Compute A.T x B.T
✅ Triton and Torch match
Time for torch: 0.45083683729171753 ms
Time for triton: 1.8271965980529785 ms
```
Note that the important commit is
`31386ef1132c3f6cf9cb5f1063ecfab705f4c2a1`. Once #2420 is merged I will
rebase this.
Depends on #2420. Links to #1795.1 parent d66d424 commit 89868e2
File tree
2 files changed
+22
-4
lines changed- test/TritonIntelGPU
- third_party/intel/lib/TritonIntelGPUTransforms
2 files changed
+22
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
Lines changed: 21 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| 54 | + | |
| 55 | + | |
54 | 56 | | |
55 | 57 | | |
56 | 58 | | |
| |||
97 | 99 | | |
98 | 100 | | |
99 | 101 | | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
100 | 119 | | |
101 | | - | |
102 | | - | |
103 | | - | |
| 120 | + | |
| 121 | + | |
104 | 122 | | |
105 | 123 | | |
106 | 124 | | |
| |||
0 commit comments