Commit 4aee929
authored
[Builder] Add support for Olive quantized models (#1647)
- Support new `"olive"` quant type
- Weight and zero-point packings are the same as gptq. No g_idx.
- Similar to the `k_quant` mixed precision `int4_algo`, select matmuls
can be in 8 bits.
- Currently, we ensure that the `q_proj`, `k_proj`, `v_proj` matmuls use
the same configuration (bits and group_size) so that they can be merged
without issues.
- The modules are generalized to remove the requirement that all matmuls
in a layer must have the same bits and group_size.
- `quant_weight` and `dequant_weight` support no `g_idx` by using
`repeat_interleave`. Otherwise, we have to create a trivial g_idx like
the quark model does.
- `pack_ort_format` supports 8 bit packing.1 parent 3bac249 commit 4aee929
2 files changed
+193
-116
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
901 | 901 | | |
902 | 902 | | |
903 | 903 | | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
904 | 908 | | |
905 | 909 | | |
906 | 910 | | |
| |||
994 | 998 | | |
995 | 999 | | |
996 | 1000 | | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
997 | 1005 | | |
998 | 1006 | | |
999 | 1007 | | |
| |||
0 commit comments