Name	Name	Last commit message	Last commit date
parent directory ..
autoround	autoround
awq	awq
big_models_with_sequential_onloading	big_models_with_sequential_onloading
compressed_inference	compressed_inference
disk_offloading	disk_offloading
model_free_ptq	model_free_ptq
multimodal_audio	multimodal_audio
multimodal_vision	multimodal_vision
quantization_kv_cache	quantization_kv_cache
quantization_non_uniform	quantization_non_uniform
quantization_w4a16	quantization_w4a16
quantization_w4a16_fp4	quantization_w4a16_fp4
quantization_w4a4_fp4	quantization_w4a4_fp4
quantization_w4a8	quantization_w4a8
quantization_w4a8_fp8	quantization_w4a8_fp8
quantization_w8a8_fp8	quantization_w8a8_fp8
quantization_w8a8_int8	quantization_w8a8_int8
quantizing_moe	quantizing_moe
sparse_2of4_quantization_fp8	sparse_2of4_quantization_fp8
transform	transform
README.md	README.md

Name

Last commit message

Last commit date

awq

big_models_with_sequential_onloading

quantization_kv_cache

quantization_non_uniform

quantization_w4a16

quantization_w4a16_fp4

quantization_w4a4_fp4

quantization_w4a8

quantization_w4a8_fp8

quantization_w8a8_fp8

quantization_w8a8_int8

quantizing_moe

sparse_2of4_quantization_fp8

transform

README.md

weight	-4

LLM Compressor Examples

The LLM Compressor examples are organized primarily by quantization scheme. Each folder contains model-specific examples showing how to apply that quantization scheme to a particular model.

Some examples are additionally grouped by model type, such as:

multimodal_audio
multimodal_vision
quantizing_moe

Other examples are grouped by algorithm, such as:

awq
autoround

How to find the right example

If you are interested in quantizing a specific model, start by browsing the model-type folders (for example, multimodal_audio, multimodal_vision, or quantizing_moe).
If you don’t see your model there, decide which quantization scheme you want to use (e.g., FP8, FP4, INT4, INT8, or KV cache / attention quantization) and look in the corresponding quantization_*** folder.
Each quantization scheme folder contains at least one LLaMA 3 example, which can be used as a general reference for other models.

Where to start if you’re unsure

If you’re unsure which quantization scheme to use, a good starting point is a data-free pathway, such as w8a8_fp8, found under quantization_w8a8_fp8. For more details on available schemes and when to use them, see the Compression Schemes guide.

Need help?

If you don’t see your model or aren’t sure which quantization scheme applies, feel free to open an issue and someone from the community will be happy to help.

!!! note We are currently updating and improving our documentation and examples structure. Feedback is very welcome during this transition.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

LLM Compressor Examples

How to find the right example

Where to start if you’re unsure

Need help?

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

LLM Compressor Examples

How to find the right example

Where to start if you’re unsure

Need help?