@@ -1017,52 +1017,103 @@ is shown in a single column in the table below.
10171017
10181018==== Intel XMX Supported Combinations
10191019This is currently available in devices with the architecture
1020- `architecture::intel_gpu_pvc`, `architecture::intel_gpu_dg2_g10`,
1021- `architecture::intel_gpu_dg2_g11`, and
1022- `architecture::intel_gpu_dg2_g12`.
1023- In these architectures'
1024- implementation, the type of the C matrix must be the same as the type
1025- of the D matrix. Therefore, that common type is shown in a single
1026- column in the table below.
1020+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1021+ `architecture::intel_gpu_lnl_m`, `architecture::intel_gpu_dg2_g10`,
1022+ `architecture::intel_gpu_dg2_g11`, `architecture::intel_gpu_dg2_g12`,
1023+ and `architecture::intel_gpu_arl_h`.
10271024
10281025[frame="none",options="header"]
10291026|======================
1030- | A type | B type | C and  D type | M | N | K | device
1027+ | A type | B type | C type |  D type | M | N | K | device
10311028.2+| `matrix_type::uint8`  .2+| `matrix_type::uint8` .2+| 
1032- `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32
1033- |`architecture::intel_gpu_pvc`
1029+ `matrix_type::sint32` .2+| `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32
1030+ |`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1031+ `architecture::intel_gpu_lnl_m`
10341032|8|`architecture::intel_gpu_dg2_g10,
1035- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1033+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1034+ `architecture::intel_gpu_arl_h`
10361035.2+| `matrix_type::uint8`  .2+| `matrix_type::sint8` .2+| 
1037- `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1038- `architecture::intel_gpu_pvc`
1036+ `matrix_type::sint32` .2+|`matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1037+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1038+ `architecture::intel_gpu_lnl_m`
10391039|8|`architecture::intel_gpu_dg2_g10,
1040- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1040+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1041+ `architecture::intel_gpu_arl_h`
10411042.2+| `matrix_type::sint8`  .2+| `matrix_type::uint8` .2+| 
1042- `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1043- `architecture::intel_gpu_pvc`
1043+ `matrix_type::sint32` .2+|`matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1044+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1045+ `architecture::intel_gpu_lnl_m`
10441046|8|`architecture::intel_gpu_dg2_g10,
1045- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1047+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1048+ `architecture::intel_gpu_arl_h`
10461049.2+| `matrix_type::sint8`  .2+| `matrix_type::sint8` .2+| 
1047- `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1048- `architecture::intel_gpu_pvc`
1050+ `matrix_type::sint32` .2+| `matrix_type::sint32`  .2+|  +<=+ 8 |  16 .2+|  32 |
1051+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1052+ `architecture::intel_gpu_lnl_m`
10491053|8|`architecture::intel_gpu_dg2_g10,
1050- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1051- .2+|`matrix_type::fp16`       .2+|  `matrix_type::fp16`   .2+| 
1052- `matrix_type::fp32`   .2+|  +<=+ 8 |  16   .2+|  16 |
1053- `architecture::intel_gpu_pvc`
1054- |8| `architecture::intel_gpu_dg2_g10,
1055- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1056- .6+|  `matrix_type::bf16`       .6+|  `matrix_type::bf16`   .6+| 
1057- `matrix_type::fp32`  | 16 | 16 | 16 .4+|`architecture::intel_gpu_pvc`
1058- | 1 | 64 | 16 | 32 | 64 | 16
1054+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1055+ `architecture::intel_gpu_arl_h`
1056+ .8+|`matrix_type::fp16`       .8+|  `matrix_type::fp16`   .8+| 
1057+ `matrix_type::fp32` .8+|`matrix_type::fp32`  .1+| 16 .1+| 16 | 16
1058+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`, 
1059+ `architecture::intel_gpu_lnl_m`
1060+ .2+| 1 .2+| 64 | 16 |32 
1061+ .2+| 32 .2+| 64 | 16 |32 
1062+ .2+|  +<=+ 8 |  16   .2+|  16 
1063+ |8 .2+| `architecture::intel_gpu_dg2_g10,
1064+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1065+ `architecture::intel_gpu_arl_h`
1066+ .1+| 32 .1+| 32 .1+| 16 
1067+ .6+|`matrix_type::fp16`       .6+|  `matrix_type::fp16`   .6+| 
1068+ `matrix_type::fp16` .6+|`matrix_type::fp32`  .1+|  +<=+ 8 |  16   .1+|  16
1069+ .6+| `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`, 
1070+ `architecture::intel_gpu_lnl_m`
1071+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1072+ .2+| 32 .2+| 64 | 16 | 32 
1073+ .6+|`matrix_type::fp16`       .6+|  `matrix_type::fp16`   .6+| 
1074+ `matrix_type::fp32` .6+|`matrix_type::fp16`  .1+|  +<=+ 8 |  16    .1+|  16
1075+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`, 
1076+ `architecture::intel_gpu_lnl_m`
1077+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1078+ .2+| 32 .2+| 64 |16 | 32 
1079+ .6+|`matrix_type::fp16`       .6+|  `matrix_type::fp16`   .6+| 
1080+ `matrix_type::fp16` .6+|`matrix_type::fp16`  .1+|  +<=+ 8 |  16   .1+|  16
1081+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`, 
1082+ `architecture::intel_gpu_lnl_m`
1083+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 |32 .2+| 32 .2+| 64 | 16 | 32
1084+ .8+|  `matrix_type::bf16`       .8+|  `matrix_type::bf16`   .8+| 
1085+ `matrix_type::fp32` .8+| `matrix_type::fp32`  | 16 | 16 | 16
1086+ .6+|`architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`, 
1087+ `architecture::intel_gpu_lnl_m`
1088+ .2+| 1 .2+| 64 | 16 | 32 
1089+ .2+| 32 .2+| 64 | 16 |32 
10591090.2+|  +<=+ 8 |  16   .2+|  16 
10601091|8 .2+| `architecture::intel_gpu_dg2_g10,
1061- architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`
1092+ architecture::intel_gpu_dg2_g11, architecture::intel_gpu_dg2_g12`,
1093+ `architecture::intel_gpu_arl_h`
10621094.1+| 32 .1+| 32 .1+| 16 
1095+ .6+|`matrix_type::bf16`       .6+|  `matrix_type::bf16`   .6+| 
1096+ `matrix_type::bf16` .6+|`matrix_type::fp32`  .1+|  +<=+ 8 |  16   .1+|  16 .6+|
1097+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1098+ `architecture::intel_gpu_lnl_m`
1099+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1100+ .2+| 32 .2+| 64 |16 | 32 
1101+ .6+|`matrix_type::bf16`       .6+|  `matrix_type::bf16`   .6+| 
1102+ `matrix_type::fp32` .6+|`matrix_type::bf16`  .1+|  +<=+ 8 |  16   .1+|  16 .6+|
1103+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1104+ `architecture::intel_gpu_lnl_m`
1105+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1106+ .2+| 32 .2+| 64 |16 | 32 
1107+ .6+|`matrix_type::bf16`       .6+|  `matrix_type::bf16`   .6+| 
1108+ `matrix_type::bf16` .6+|`matrix_type::bf16`  .1+|  +<=+ 8 |  16   .1+|  16 .6+|
1109+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1110+ `architecture::intel_gpu_lnl_m`
1111+ | 16 | 16 | 16 .2+| 1 .2+| 64 | 16 | 32
1112+ .2+| 32 .2+| 64 |16 | 32 
10631113|  `matrix_type::tf32`       |  `matrix_type::tf32`   |
1064- `matrix_type::fp32`   |  +<=+ 8 |  16   |  8 |
1065- `architecture::intel_gpu_pvc`
1114+ `matrix_type::fp32` .2+| `matrix_type::fp32`   |  +<=+ 8 |  16   |  8 |
1115+ `architecture::intel_gpu_pvc`, `architecture::intel_gpu_bmg_g21`,
1116+ `architecture::intel_gpu_lnl_m`
10661117|======================
10671118
10681119==== Nvidia Tensor Cores Supported Combinations
0 commit comments