Skip to content

Commit 31c791e

Browse files
Update docs/getting-started/faq.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
1 parent 7340667 commit 31c791e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/getting-started/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,4 +26,4 @@ Refer to [https://docs.vllm.ai/projects/llm-compressor/en/latest/getting-started
2626

2727
All linear layers go through basic quantization except the `lm_head` layer. This is because the `lm_head` layer is the last layer of the model and sensitive to quantization, which will impact the model's accuracy. For example, [https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w8a8_fp8/llama3_example.py#L18](here) is a code snippet of how to ignore the lm_layer.
2828

29-
Mixture of Expert (MoE) models, due to their advanced architecture and some components such as gate and routing layers, are sensitive to quantization as well. For example, [https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/qwen_example.py#L60](here) is a code snippet of how to ignore the gates.
29+
Mixture of Expert (MoE) models, due to their advanced architecture and some components such as gate and routing layers, are sensitive to quantization as well. For example, [this code snippet shows how to ignore the gates](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantizing_moe/qwen_example.py#L60).

0 commit comments

Comments
 (0)