Skip to content

[Feature Request]: Need LoRA model in .gguf format #243

@bioinformatist

Description

@bioinformatist

Feature request / 功能建议

Similar to #231, but useful

Hey my dear bros, we're building an RAG application (especially for one of our products) using MiniCPM3. Below is our stack:

Type Component
LLM MiniCPM3
Web server Shuttle | Axum
OpenAI-compatible API server llama.cpp
Vector database qdrant

It's almost done.

As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:

# Suppose we already have downloaded MiniCPM3-4B and MiniCPM3-RAG-LoRA-GGUF models in current directory
docker run --rm -it -p 8080:8080 -v $PWD/MiniCPM3-4B-GGUF:/models -v $PWD/MiniCPM3-RAG-LoRA-GGUF:/lora --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/minicpm3-4b-q4_k_m.gguf --host 0.0.0.0 --port 8080 --n-gpu-layers 99 -v -ub 1024 -b 4096 --lora lora/lora-adapter-fp16.gguf

And the LoRA model cannot be converted to .gguf format now as the ggml-org/llama.cpp#9396 haven't be merged:

# As ditto
docker run -it --rm --entrypoint /app/convert_lora_to_gguf.py -v $PWD/MiniCPM3-4B:/models -v $PWD/MiniCPM3-RAG-LoRA:/lora ghcr.io/ggerganov/llama.cpp:full --outtype q8_0 --base /models /lora

It said:

The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Or could you give us some tips for converting? Thanks a lot!

MiniCPM3 is, de facto, an ideal edge-side LLM for small companies.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions