Skip to content

Conversation

@Nexesenex
Copy link
Contributor

@Nexesenex Nexesenex commented Oct 12, 2024

Intermediary FTYPE mixed between IQ3_M and IQ4_XS at 4bpw.

Transposed loyally from @ikawrakow's new FTYPE IQ3_KL on ik_llama.cpp, so it can be trusted, except for attn_k.weight that I chose to bump to IQ4_XS when GQA is present, because it makes no sense whatsoever in such case to have a key head smaller than attn_output.weight or half of the FFNs.

The XL suffix is chosen in the eventuality of the emergence of an "IQ3_M/L" or "IQ4_XXS" GGML_TYPE close to 4bpw, with a related FTYPE. In such case, the proposed mixed FTYPE could be replaced.

This very FTYPE answers to the demand of many users, me included, to have an intermediary between IQ3_M and IQ4_XS, separated by 0.5bpw, and preventing the users to get the best fully offloadable quality in many cases (e.g : 70b on 36GiB VRAM, 123b on 64GiB VRAM).

In an ulterior PR that I can provide, this in order to not multiply the FTYPES, IQ2_M could be eliminated, IQ2_S elevated to replace it and make sense in term of nomenclature (currently, IQ2_S is an IQ2_XS+, and IQ2_M uses IQ2_S GGML_TYPE), and IQ2_XS having a little boost to compensate for the potential disappearance of the old IQ2_S FTYPE, which is tbh first in line to be sacrificed. Among other FTYPE elimination choices, either IQ3_S, either IQ3_M could also be sent in retirement.

Note : I allow myself to PR this because I amused myself quite extensively with the quant strategies, and already revamped for my own use LCCP's quant strategies, some of it being already PRed on this repo as a demo.

In IK's graph, IQ3_XL's dot should like like IQ3_KL, with a lil bit more weight and a lil bit more ppl. It's clearly viable, although a bit different from my own custom quant strategies (I use different use_more_bits formulas for myself, and bump more the attn_v and attn_k, in Q6_K and Q5_K respectively).

374896349-5ece2ee2-23e6-4e9e-8502-27c91423a2f9

Ref : ikawrakow/ik_llama.cpp@b30c9e1

Intermediary FTYPE mixed between IQ3_M and IQ4_XS at 4bpw.

Transposed loyally from @ikawrakow's new FTYPE IQ3_KL on ik_llama.cpp so it can be trusted, except for attn_k.weight that I chose to IQ4_XS when GQA is present, because it makes no sense whatsoever in such case to have a key head smaller than attn_output.weight or half of the FFNs.

The XL suffix is chosen in the eventuality of the emergence of an IQ3_M/L GGML_TYPE close to 4bpw, with a related FTYPE. In such case, the proposed mixed FTYPE could be replaced.

This very FTYPE answers to the demand of many users, me included, to have an intermediary between IQ3_M and IQ4_XS, separated by 0.5bpw, and preventing the users to get the best quality in many cases (70b on 36GiB VRAM, 123b on 64GiB VRAM).

In an ulterior PR that I can provide, this in order to not multiply the FTYPES, IQ2_M could be eliminated, IQ2_S elevated to replace it and make sense in term of nomenclature (currently, IQ2_S is an IQ2_XS+, and IQ2_M uses IQ2_S GGML_TYPE), and IQ2_XS having a little boost to compensate for the potential disappearance of the old IQ2_S FTYPE, which is tbh first in line to be sacrificed.

Note : I allow myself to PR this because I amused myself quite extensively with the quant strategies, and already revamped for my own use LCCP's quant strategies, some of it being already PRed on this repo as a demo.
@github-actions github-actions bot added examples python python script changes labels Oct 12, 2024
@Nexesenex Nexesenex changed the title New quand strategy / FTYPE IQ3_XL 4bpw New quant strategy / FTYPE IQ3_XL 4bpw Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant