You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks to llm-compressor for its help in model compression. I would like to ask some K/V cache quantization questions about llm-compressor:
Does llm-compressor currently support K/V cache quantization? It seems that this is not supported currently. If yes, will llm-compressor support K/V cache quantization in the future?
vLLM seems to support K/V cache quantization. Is K/V cache quantization supported only in vLLM and not in llm-compressor?