Skip to content

Commit 608baca

Browse files
authored
Update README.md
1 parent 77043c3 commit 608baca

File tree

1 file changed

+29
-11
lines changed

1 file changed

+29
-11
lines changed

README.md

Lines changed: 29 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,36 @@
22

33
Open-source FP8 quantization project for producing compressed checkpoints for running in vLLM - see https://github.com/vllm-project/vllm/pull/4332 for implementation.
44

5-
# How to run quantized models
5+
## How to quantize a model
66

7-
Install vLLM: `pip install vllm>=0.4.2`
7+
Install this repo's requirements:
8+
```bash
9+
pip install -r requirements.txt
10+
```
11+
12+
Command to produce a `Meta-Llama-3-8B-Instruct-FP8` quantized LLM:
13+
```bash
14+
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
15+
```
16+
17+
Example model checkpoint with FP8 static scales for activations and weights: https://huggingface.co/nm-testing/Meta-Llama-3-8B-Instruct-FP8
18+
19+
All arguments available for `quantize.py`:
20+
```
21+
usage: quantize.py [-h] [--model-id MODEL_ID] [--save-dir SAVE_DIR] [--activation-scheme {static,dynamic}] [--num-samples NUM_SAMPLES] [--max-seq-len MAX_SEQ_LEN]
22+
23+
options:
24+
-h, --help show this help message and exit
25+
--model-id MODEL_ID
26+
--save-dir SAVE_DIR
27+
--activation-scheme {static,dynamic}
28+
--num-samples NUM_SAMPLES
29+
--max-seq-len MAX_SEQ_LEN
30+
```
31+
32+
## How to run FP8 quantized models
33+
34+
[vLLM](https://github.com/vllm-project/vllm) has full support for FP8 models quantized with this package. Install vLLM with: `pip install vllm>=0.4.2`
835

936
Then simply pass the quantized checkpoint directly to vLLM's entrypoints! It will detect the checkpoint format using the `quantization_config` in the `config.json`.
1037
```python
@@ -17,15 +44,6 @@ print(outputs[0].outputs[0].text)
1744
# ' there was a beautiful princess who lived in a far-off kingdom. She was kind'
1845
```
1946

20-
## How to quantize a model
21-
22-
Example model with static scales for activations and weights: https://huggingface.co/nm-testing/Meta-Llama-3-8B-Instruct-FP8
23-
24-
Command to produce:
25-
```bash
26-
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
27-
```
28-
2947
## Checkpoint structure explanation
3048

3149
Here we detail the experimental structure for the fp8 checkpoints.

0 commit comments

Comments
 (0)