Skip to content

Does tensorRT-LLM support serving 4bit quantised unsloth Llama model #2472

@jayakommuru

Description

@jayakommuru

We want to deploy https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit which is 4-bit quantized version of llama-3.2-1B model. It is quantized using bitsandbytes. Can we deploy this using tensor RT-LLM backend ? If so, is there any documentation to refer?

Metadata

Metadata

Assignees

Labels

Low PrecisionLower-precision formats (INT8/INT4/FP8) for TRTLLM quantization (AWQ, GPTQ).questionFurther information is requestedtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions