New quant strategy / FTYPE IQ3_XL 4bpw #9855

Nexesenex · 2024-10-12T01:03:26Z

Intermediary FTYPE mixed between IQ3_M and IQ4_XS at 4bpw.

Transposed loyally from @ikawrakow's new FTYPE IQ3_KL on ik_llama.cpp, so it can be trusted, except for attn_k.weight that I chose to bump to IQ4_XS when GQA is present, because it makes no sense whatsoever in such case to have a key head smaller than attn_output.weight or half of the FFNs.

The XL suffix is chosen in the eventuality of the emergence of an "IQ3_M/L" or "IQ4_XXS" GGML_TYPE close to 4bpw, with a related FTYPE. In such case, the proposed mixed FTYPE could be replaced.

This very FTYPE answers to the demand of many users, me included, to have an intermediary between IQ3_M and IQ4_XS, separated by 0.5bpw, and preventing the users to get the best fully offloadable quality in many cases (e.g : 70b on 36GiB VRAM, 123b on 64GiB VRAM).

In an ulterior PR that I can provide, this in order to not multiply the FTYPES, IQ2_M could be eliminated, IQ2_S elevated to replace it and make sense in term of nomenclature (currently, IQ2_S is an IQ2_XS+, and IQ2_M uses IQ2_S GGML_TYPE), and IQ2_XS having a little boost to compensate for the potential disappearance of the old IQ2_S FTYPE, which is tbh first in line to be sacrificed. Among other FTYPE elimination choices, either IQ3_S, either IQ3_M could also be sent in retirement.

Note : I allow myself to PR this because I amused myself quite extensively with the quant strategies, and already revamped for my own use LCCP's quant strategies, some of it being already PRed on this repo as a demo.

In IK's graph, IQ3_XL's dot should like like IQ3_KL, with a lil bit more weight and a lil bit more ppl. It's clearly viable, although a bit different from my own custom quant strategies (I use different use_more_bits formulas for myself, and bump more the attn_v and attn_k, in Q6_K and Q5_K respectively).

Ref : ikawrakow/ik_llama.cpp@b30c9e1

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

@ikawrakow

Intermediary FTYPE mixed between IQ3_M and IQ4_XS at 4bpw. Transposed loyally from @ikawrakow's new FTYPE IQ3_KL on ik_llama.cpp so it can be trusted, except for attn_k.weight that I chose to IQ4_XS when GQA is present, because it makes no sense whatsoever in such case to have a key head smaller than attn_output.weight or half of the FFNs. The XL suffix is chosen in the eventuality of the emergence of an IQ3_M/L GGML_TYPE close to 4bpw, with a related FTYPE. In such case, the proposed mixed FTYPE could be replaced. This very FTYPE answers to the demand of many users, me included, to have an intermediary between IQ3_M and IQ4_XS, separated by 0.5bpw, and preventing the users to get the best quality in many cases (70b on 36GiB VRAM, 123b on 64GiB VRAM). In an ulterior PR that I can provide, this in order to not multiply the FTYPES, IQ2_M could be eliminated, IQ2_S elevated to replace it and make sense in term of nomenclature (currently, IQ2_S is an IQ2_XS+, and IQ2_M uses IQ2_S GGML_TYPE), and IQ2_XS having a little boost to compensate for the potential disappearance of the old IQ2_S FTYPE, which is tbh first in line to be sacrificed. Note : I allow myself to PR this because I amused myself quite extensively with the quant strategies, and already revamped for my own use LCCP's quant strategies, some of it being already PRed on this repo as a demo.

github-actions bot added examples python python script changes labels Oct 12, 2024

Nexesenex changed the title ~~New quand strategy / FTYPE IQ3_XL 4bpw~~ New quant strategy / FTYPE IQ3_XL 4bpw Oct 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New quant strategy / FTYPE IQ3_XL 4bpw #9855

New quant strategy / FTYPE IQ3_XL 4bpw #9855

Nexesenex commented Oct 12, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New quant strategy / FTYPE IQ3_XL 4bpw #9855

Are you sure you want to change the base?

New quant strategy / FTYPE IQ3_XL 4bpw #9855

Conversation

Nexesenex commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Nexesenex commented Oct 12, 2024 •

edited

Loading