Skip to content

Commit 5ffcc30

Browse files
committed
update readme and upload models
1 parent 2654253 commit 5ffcc30

File tree

4 files changed

+33
-0
lines changed

4 files changed

+33
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ pure C++ implementation based on [@ggerganov](https://github.com/ggerganov)'s [g
1313

1414
**What's New:**
1515

16+
* 2025-06-03: Kimi-VL
1617
* 2025-05-28: Gemma3 fully supported
1718
* 2025-05-23: [I can see](./docs/multimodal.md): Fuyu
1819
* 2025-05-21: Re-quantization when loading (e.g. `--re-quantize q4_k`)

docs/gpu.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,15 @@ The full format of `-ngl` is `-ngl [id:]layer_specs[;id:layer_specs]..`. `id` is
5151
Suppose device 0 is GPU, and device 1 is CPU, `-ngl 1:5;0:10` will put the first 5 layers to CPU, the next 10 layers to GPU,
5252
and all other layers to CPU as default.
5353

54+
You can use `-mgl` (`--model_gpu_layers`) to specify number of layers of a specific model to be deployed to different backend devices.
55+
`-mgl MODEL N`, in which `N` shares the same syntax as `-ngl` and `MODEL` can be
56+
57+
* `main`: the main model.
58+
* `vis`: the vision accessory model (which typically project images/videos into LLM).
59+
* `any`: any model.
60+
61+
`-ngl N` is equivalent to `-mgl any N`.
62+
5463
Tip: Use `--show_devices` to check all available devices and `--show` to check basic hyper parameters of a model.
5564

5665
## Known issues

docs/models.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,9 @@ Please use `--format completion` for these models.
297297

298298
Note: Only download `tokenizer.model` and DO NOT download `tokenizer.json` when converting.
299299

300+
* Kimi (`KimiVLForConditionalGeneration`)
301+
* [x] VL: [A3B-Instruct](https://huggingface.co/moonshotai/Kimi-VL-A3B-Instruct/tree/7a3c132a7b0f1f1677f5a72f258bd3afded7d357), [A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking/commit/16681d8ac24e505088698e4e34ea494dd6e24400)
302+
300303
## RAG Models
301304

302305
* Text Embedding (`XLMRobertaModel`)

scripts/models.json

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2776,5 +2776,25 @@
27762776
}
27772777
}
27782778
}
2779+
},
2780+
"kimi-vl": {
2781+
"brief": "Kimi-VL, an efficient open-source Mixture-of-Experts (MoE) vision-language model (VLM) that offers advanced multimodal reasoning, long-context understanding, and strong agent capabilities.",
2782+
"default": "a3b-instruct",
2783+
"license": "MIT",
2784+
"variants": {
2785+
"a3b-instruct": {
2786+
"default": "q8",
2787+
"quantized": {
2788+
"q8": {
2789+
"size": 17566398608,
2790+
"url": "chatllm_quantized_kimi-vl/kimi-vl.bin"
2791+
},
2792+
"q4_1": {
2793+
"size": 10447149072,
2794+
"url": "chatllm_quantized_kimi-vl/kimi-vl-q4_1.bin"
2795+
}
2796+
}
2797+
}
2798+
}
27792799
}
27802800
}

0 commit comments

Comments
 (0)