You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[DT] Fuse encoding ops more aggressively for multi-use, gather, and slices ops. (iree-org#21830)
The fusion constraint of multi-use dispatch is only required by
SetEncoding pass, because it has to move consumer dispatches around. It
is not required by encoding fusion, because it is just moving a
SetEncoding op into its producer dispatch.
The revision also allows the fusion when the dispatch region contains
tensor.extract_slice op and iree_linalg_ext.gather ops. It reduces the
number of dispatches to 644 in llama fp8 model, the same as without data
tiling. The latency drops 25ms, from 378ms to 353ms.
| | No Data Tiling | Data Tiling w/o the revision | Data Tiling w/ the
revision |
| ------------- | ------------- | ------------- | ------------- |
| Benchmark latency | 243ms | 378ms | 353ms |
| Memory usage (HIP unpooled) | 15.9GB | 31.14GB | 31.11GB |
| Number of dispatches | 644 | 741 | 644 |
| | No Data Tiling (ms) | Data Tiling w/o the revision | Data Tiling w/
the revision |
| ------------- | ------------- | ------------- | ------------- |
| dispatch_15_attention_4x8x4xDx128xf8 | 62.29 | 55.35 | 59.21 |
| dispatch_20_matmul_like_Dx14336x4096_f8xf8xf32 | 40.13 | 89.14 |
93.72|
| dispatch_19_matmul_like_Dx14336x4096_f8xf8xf32 | 28.01 | 44.78 | 44.59
|
| dispatch_21_matmul_like_Dx4096x14336_f8xf8xf32 | 27.25 | 40.18 | 39.99
|
| dispatch_643_matmul_like_Dx128256x4096_f16xf16xf32 | 17.1 | 29.76 |
29.21 |
| dispatch_16_matmul_like_Dx4096x4096_f8xf8xf32 | 8.83 | 17.92 | 17.91 |
| dispatch_23_matmul_like_Dx4096x4096_f8xf8xf32 | 9.27 | 16.69 | 16.59 |
| encoding_10_encode_Dx4096xf8_to_Dx4096xf8 | - | 32.15 | - |
| encoding_6_encode_Dx14336xf32_to_Dx14336xf32 | - | 0.318 | - |
---------
Signed-off-by: hanhanW <[email protected]>
Signed-off-by: Ivan Ho <[email protected]>
0 commit comments