Commit 5dd06ff
[Qwen3VLMoe] Add linearized definition and FP8 Quantization Example (vllm-project#1874)
SUMMARY:
- Updates the MoE layer to use a linearized definition such that we can
quantize and run the model in vLLM
- Wraps the gate layer so that it is properly ignored - this is hack for
now. We will need to do this properly in ct
- Not adding forward pass for now; will add a forward pass as a
follow-up but would like it in for the release to enable FP8
quantization
- Note - requires latest transformers
TEST PLAN:
- Produces
`/proving-grounds/engine/hub_cache/Qwen3-VL-235B-A22B-Instruct-FP8_DYNAMIC`
which generates coherent generations:
```python
if __name__ == '__main__':
import torch
from vllm import LLM, SamplingParams
import torch
prompts = [
"The Swiss Alps are",
"Brad Marchand is",
"The Toronto Maple Leafs are"
]
# Create a sampling params object for greedy sampling
sampling_params = SamplingParams(temperature=0.80, top_p=0.95, max_tokens=40, min_tokens=10)
llm = LLM("/proving-grounds/engine/hub_cache/Qwen3-VL-235B-A22B-Instruct-FP8_DYNAMIC", tensor_parallel_size=2, max_model_len=4096, enforce_eager=True)
output = llm.generate(prompts, sampling_params)
for out in output:
print(out.outputs[0].text)
```
Generations:
```bash
a true paradise for nature lovers and outdoor enthusiasts. With their snow-capped peaks, lush green valleys, and crystal-clear lakes, the Alps offer a stunning backdrop for a wide range of activities. Whether
a prominent figure in the NHL, known for his exceptional performance and leadership. He has won the Art Ross Trophy as the NHL's leading scorer, with 110 points (32 goals and
a professional ice hockey team based in Toronto, Ontario, Canada. They are members of the Atlantic Division in the Eastern Conference of the National Hockey League (NHL). The team was established in 1
```
Signed-off-by: Cassie Jeon <cajeon@redhat.com>1 parent 34a4602 commit 5dd06ff
File tree
4 files changed
+105
-17
lines changed- examples/quantization_w8a8_fp8
- src/llmcompressor/modeling
4 files changed
+105
-17
lines changedLines changed: 40 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
| 8 | + | |
| 9 | + | |
13 | 10 | | |
14 | 11 | | |
15 | 12 | | |
| |||
18 | 15 | | |
19 | 16 | | |
20 | 17 | | |
| 18 | + | |
21 | 19 | | |
22 | 20 | | |
23 | 21 | | |
| |||
81 | 79 | | |
82 | 80 | | |
83 | 81 | | |
| 82 | + | |
84 | 83 | | |
85 | 84 | | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | 85 | | |
90 | 86 | | |
91 | 87 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | 18 | | |
23 | 19 | | |
24 | 20 | | |
25 | 21 | | |
26 | 22 | | |
27 | | - | |
28 | | - | |
| 23 | + | |
| 24 | + | |
29 | 25 | | |
30 | 26 | | |
31 | 27 | | |
| |||
109 | 105 | | |
110 | 106 | | |
111 | 107 | | |
112 | | - | |
113 | | - | |
114 | | - | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
115 | 111 | | |
116 | 112 | | |
117 | 113 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
0 commit comments