Skip to content

Commit 2f97db1

Browse files
authored
[Doc] Update QA relevant to quantization (#1257)
Signed-off-by: lishunyang <lishunyang12@163.com>
1 parent accf334 commit 2f97db1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/usage/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ A: At first, you can check current [issues](https://github.com/vllm-project/vllm
2222

2323
> Q: Does vLLM-Omni support AWQ or any other quantization?
2424
25-
A: vLLM-Omni partitions model into several stages. For AR stages, it will reuse main logic of LLMEngine in vLLM. So current quantization supported in vLLM should be also supported in vLLM-Omni for them. But systematic verification is ongoing. For quantization for DiffusionEngine, we are working on it. Please stay tuned and welcome contribution!
25+
A: We plan to introduce GGUF FP8 prequantized models and online FP8 quantization in version 0.16.0. Support for other quantization types will follow in future releases. For details, please see our [Q1 quantization roadmap](https://github.com/vllm-project/vllm-omni/issues/1057).
2626

2727
> Q: Does vLLM-Omni support multimodal streaming input and output?
2828

0 commit comments

Comments
 (0)