Skip to content

[question] Does llm-compressor support K/V cache quantization? #1711

@liye0626

Description

@liye0626

Thanks to llm-compressor for its help in model compression. I would like to ask some K/V cache quantization questions about llm-compressor:

  • Does llm-compressor currently support K/V cache quantization? It seems that this is not supported currently. If yes, will llm-compressor support K/V cache quantization in the future?
  • vLLM seems to support K/V cache quantization. Is K/V cache quantization supported only in vLLM and not in llm-compressor?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions