You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Codegen][GPU] Sort intrinsic according to k alignment - Step 2 of 2 - Creating intrinsic sort routine (iree-org#21128)
This is a follow-up of iree-org#21103.
This intrinsic sorting routine will sort the MMA intrinsics by following
precedence rules:
1) K-alignment. We prefer intrinsics that can evenly divide the K
dimension of the problem.
2) M/N-alignment. We prefer intrinsics that can evenly divide the M and
N dimensions of the problem.
3) Intrinsic with larger gemm size.
4) Intrinsic with larger K size.
Scope of the impact is igemm and matmul problems. Other pipelines can
choose to adopt this if the sorting turns out to be useful.
### Motivation:
convolution configuration with
> BOO_CACHE_ON=0 python boo_driver.py convbfp16 -n 16 -c 40 -H 192 -W
128 -k 40 -y 3 -x 3 -p 1 -q 1 -u 2 -v 2 -l 1 -j 1 -m conv -g 1 -F 1 -t 1
--in_layout NHWC --out_layout NHWC --fil_layout NHWC --iter 50
Since the gemmK size is 360, the best intrinsic is 32x32x8xbf16, not
16x16x16xbf16, as the former will end in aligned K. Before this PR:
`362.06 us`. After this PR: `44.74 us`. Note that due to limitation of
the tuner, this combination won't be covered by typical tuning run with
2k trials. So it is essential we bridge the gap through heuristic first
before tuner is more capable of searching through different heuristics.
---------
Signed-off-by: jerryyin <[email protected]>
0 commit comments