Skip to content

AWQ + KV cache quantization #2488

@saranya-vp-15149

Description

@saranya-vp-15149

Currently kv cache fp8 quantization available only with W8A8 model quantization . I would be great KV cache quantization fp8 (using dataset calibrated scale) with all available weight quantization is possible. Also I would be great if you introduce kv cache pruning/compression like kvpress / kv compress etc

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions