You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SUMMARY:
AutoRound quantization example: qwen3-vl nvfp4
TEST PLAN:
python qwen3_vl_example.py
Output:
```
Hello my name is Mihai, I am a 30 year old male, and I am currently a software engineer working in a company that develops software for the financial sector. I am a very passionate person, and I am always eager to learn new things. I have a strong interest in AI, machine learning, and data science. I am also very interested in the intersection of these fields with finance. I am currently working on a project that involves building a machine learning model to predict stock prices. I am
```
---------
Signed-off-by: Xin He <xin3.he@intel.com>
Co-authored-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Expand all lines: examples/autoround/quantization_w4a4_fp4/README.md
+61-7Lines changed: 61 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,15 +16,17 @@ pip install -e .
16
16
17
17
## Quickstart
18
18
19
-
The example includes an end-to-end script for applying the AutoRound quantization algorithm.
19
+
The example includes end-to-end scripts for applying the AutoRound quantization algorithm.
20
+
21
+
### Llama 3.1 Example
20
22
21
23
```bash
22
24
python3 llama3.1_example.py
23
25
```
24
26
25
27
The resulting model `Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound` is ready to be loaded into vLLM.
26
28
27
-
### Evaluate Accuracy
29
+
####Evaluate Accuracy
28
30
29
31
With the model created, we can now load and run in vLLM (after installing).
30
32
@@ -33,7 +35,6 @@ from vllm import LLM
33
35
model = LLM("./Meta-Llama-3.1-8B-Instruct-NVFP4-AutoRound")
34
36
```
35
37
36
-
We can evaluate accuracy with `lm_eval` (`pip install lm-eval==0.4.9.1`):
37
38
> Note: quantized models can be sensitive to the presence of the `bos` token. `lm_eval` does not add a `bos` token by default, so make sure to include the `add_bos_token=True` argument when running your evaluations.
> Note: quantized model accuracy may vary slightly due to nondeterminism.
129
+
76
130
### Questions or Feature Request?
77
131
78
132
Please open up an issue on [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor) or [intel/auto-round](https://github.com/intel/auto-round).
0 commit comments