[Feature Request]: Need LoRA model in `.gguf` format

### Feature request  / 功能建议

> Similar to #231, but useful

Hey my dear bros, we're building an RAG application (especially for [one of our products](https://kun.bigdick.live/)) using MiniCPM3. Below is our stack:

| Type | Component |
| - | - |
| LLM | [MiniCPM3](https://github.com/OpenBMB/MiniCPM) |
| Web server | [Shuttle](https://www.shuttle.rs/) \| [Axum](https://github.com/tokio-rs/axum) |
| OpenAI-compatible API server | [llama.cpp](https://github.com/ggerganov/llama.cpp) |
| Vector database | [qdrant](https://qdrant.tech/) |

It's almost done.

As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:

```bash
# Suppose we already have downloaded MiniCPM3-4B and MiniCPM3-RAG-LoRA-GGUF models in current directory
docker run --rm -it -p 8080:8080 -v $PWD/MiniCPM3-4B-GGUF:/models -v $PWD/MiniCPM3-RAG-LoRA-GGUF:/lora --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/minicpm3-4b-q4_k_m.gguf --host 0.0.0.0 --port 8080 --n-gpu-layers 99 -v -ub 1024 -b 4096 --lora lora/lora-adapter-fp16.gguf
```

And the LoRA model cannot be converted to `.gguf` format now as the https://github.com/ggerganov/llama.cpp/pull/9396 haven't be merged:

```bash
# As ditto
docker run -it --rm --entrypoint /app/convert_lora_to_gguf.py -v $PWD/MiniCPM3-4B:/models -v $PWD/MiniCPM3-RAG-LoRA:/lora ghcr.io/ggerganov/llama.cpp:full --outtype q8_0 --base /models /lora
```

It said:

```
The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
```

Or could you give us some tips for converting? Thanks a lot!

MiniCPM3 is, *de facto*, an ideal edge-side LLM for small companies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request]: Need LoRA model in `.gguf` format #243

Feature request / 功能建议

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Component
LLM	MiniCPM3
Web server	Shuttle \| Axum
OpenAI-compatible API server	llama.cpp
Vector database	qdrant

[Feature Request]: Need LoRA model in .gguf format #243

Description

Feature request / 功能建议

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature Request]: Need LoRA model in `.gguf` format #243