-
Notifications
You must be signed in to change notification settings - Fork 273
Open
Open
Copy link
Labels
awqFor any issue / PR related to AWQ supportFor any issue / PR related to AWQ supportcompressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasewNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support
Description
Is your feature request related to a problem? Please describe.
- Add a conversion tool in compressed-tensors which when given an AutoAWQ checkpoint, can convert the model to the compressed-tensors format using the pack_quantized compressor
- The checkpoint should also contain an updated
quantization_configin its config.json with metadata about the format. See example: https://huggingface.co/nm-testing/Qwen3-Coder-30B-A3B-Instruct-W4A16-awq/blob/main/config.json - This can be done by representing the quantization params for the AutoAWQ model using QuantizationArgs and then applying the ModelCompressor to the model
Describe the solution you'd like
- A tool which when given an AutoAWQ checkpoint, produced a compressed-tensors formatted model
- The model should be able to run in vLLM without any drop in accuracy
Metadata
Metadata
Assignees
Labels
awqFor any issue / PR related to AWQ supportFor any issue / PR related to AWQ supportcompressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgood first issueA good first issue for users wanting to contributeA good first issue for users wanting to contributegood follow-up issueA good issue for users with some familiarity of the codebaseA good issue for users with some familiarity of the codebasewNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support