@@ -19,28 +19,50 @@ State-of-the-art INT4 quantization for LLMs. ParoQuant uses learned pairwise rot
1919
2020## Quick Start
2121
22- ** NVIDIA GPU: **
22+ ### Interactive Chat
2323
2424``` bash
25+ # NVIDIA GPU
2526pip install " paroquant[vllm]"
2627python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO
2728
28- # or with Docker
29- docker run --pull=always --rm -it --gpus all --ipc=host \
30- ghcr.io/z-lab/ paroquant: chat --model z-lab/Qwen3-8B-PARO
29+ # Apple Silicon
30+ pip install " paroquant[mlx] "
31+ python -m paroquant.cli. chat --model z-lab/Qwen3-8B-PARO
3132```
3233
33- ** Apple Silicon: **
34+ ### OpenAI-Compatible API Server
3435
3536``` bash
36- pip install " paroquant[mlx]"
37- python -m paroquant.cli.chat --model z-lab/Qwen3-8B-PARO
37+ pip install " paroquant[vllm]"
38+ python -m paroquant.cli.serve --model z-lab/Qwen3-8B-PARO
39+ ```
40+
41+ ### Docker
42+
43+ ``` bash
44+ # Interactive chat
45+ docker run --pull=always --rm -it --gpus all --ipc=host \
46+ ghcr.io/z-lab/paroquant:chat --model z-lab/Qwen3-8B-PARO
47+
48+ # API server (port 8000)
49+ docker run --pull=always --rm -it --gpus all --ipc=host -p 8000:8000 \
50+ ghcr.io/z-lab/paroquant:serve --model z-lab/Qwen3-8B-PARO
3851```
3952
4053## Models
4154
4255All models are available on [ Hugging Face] ( https://huggingface.co/collections/z-lab/paroquant ) . Swap the model name in the commands above to try any of them.
4356
57+ ** Qwen3.5**
58+
59+ | Model | Checkpoint |
60+ | ---| ---|
61+ | Qwen3.5-0.8B | [ ` z-lab/Qwen3.5-0.8B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-0.8B-PARO ) |
62+ | Qwen3.5-2B | [ ` z-lab/Qwen3.5-2B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-2B-PARO ) |
63+ | Qwen3.5-4B | [ ` z-lab/Qwen3.5-4B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-4B-PARO ) |
64+ | Qwen3.5-9B | [ ` z-lab/Qwen3.5-9B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-9B-PARO ) |
65+
4466** Qwen3**
4567
4668| Model | Checkpoint |
@@ -51,15 +73,6 @@ All models are available on [Hugging Face](https://huggingface.co/collections/z-
5173| Qwen3-8B | [ ` z-lab/Qwen3-8B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3-8B-PARO ) |
5274| Qwen3-14B | [ ` z-lab/Qwen3-14B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3-14B-PARO ) |
5375
54- ** Qwen3.5**
55-
56- | Model | Checkpoint |
57- | ---| ---|
58- | Qwen3.5-0.8B | [ ` z-lab/Qwen3.5-0.8B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-0.8B-PARO ) |
59- | Qwen3.5-2B | [ ` z-lab/Qwen3.5-2B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-2B-PARO ) |
60- | Qwen3.5-4B | [ ` z-lab/Qwen3.5-4B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-4B-PARO ) |
61- | Qwen3.5-9B | [ ` z-lab/Qwen3.5-9B-PARO ` ] ( https://huggingface.co/z-lab/Qwen3.5-9B-PARO ) |
62-
6376** Llama**
6477
6578| Model | Checkpoint |
@@ -106,10 +119,10 @@ python -m paroquant.cli.convert \
106119
107120| Image | Purpose |
108121| ---| ---|
109- | ` ghcr.io/z-lab/paroquant:latest ` | Optimization & evaluation |
110- | ` ghcr.io/z-lab/paroquant:chat ` | Interactive chat (CUDA 13.0) |
111- | ` ghcr.io/z-lab/paroquant:serve ` | OpenAI-compatible API server |
122+ | ` ghcr.io/z-lab/paroquant:chat ` | Interactive chat |
112123| ` ghcr.io/z-lab/paroquant:chat-cu129 ` | Interactive chat (CUDA 12.9) |
124+ | ` ghcr.io/z-lab/paroquant:serve ` | OpenAI-compatible API server |
125+ | ` ghcr.io/z-lab/paroquant:latest ` | Optimization & evaluation |
113126| ` ghcr.io/z-lab/paroquant:eval ` | Reasoning task evaluation |
114127
115128## Citation
0 commit comments