-
Notifications
You must be signed in to change notification settings - Fork 453
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Currently kv cache fp8 quantization available only with W8A8 model quantization . I would be great KV cache quantization fp8 (using dataset calibrated scale) with all available weight quantization is possible. Also I would be great if you introduce kv cache pruning/compression like kvpress / kv compress etc
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request