Skip to content

W4fp8 AWQ #1657

@LugerW-A

Description

@LugerW-A

W4fp8 achieves good quantization and acceleration effects. Can LLMCompressor implement this quantization method and adapt it to vLLM?

Ref:
https://nvidia.github.io/TensorRT-Model-Optimizer/guides/_choosing_quant_methods.html#:~:text=Ampere%20and%20later.-,INT4%2DFP8%20AWQ%20(W4A8),-High

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions