-
Notifications
You must be signed in to change notification settings - Fork 520
Closed
Labels
featureNew featuresNew features
Description
Feature request / 功能建议
Similar to #231, but useful
Hey my dear bros, we're building an RAG application (especially for one of our products) using MiniCPM3. Below is our stack:
Type | Component |
---|---|
LLM | MiniCPM3 |
Web server | Shuttle | Axum |
OpenAI-compatible API server | llama.cpp |
Vector database | qdrant |
It's almost done.
As MiniCPM3 comes with an RAG suite, we'd like to use the LoRA adapter for better performance, just like:
# Suppose we already have downloaded MiniCPM3-4B and MiniCPM3-RAG-LoRA-GGUF models in current directory
docker run --rm -it -p 8080:8080 -v $PWD/MiniCPM3-4B-GGUF:/models -v $PWD/MiniCPM3-RAG-LoRA-GGUF:/lora --gpus all ghcr.io/ggerganov/llama.cpp:server-cuda -m models/minicpm3-4b-q4_k_m.gguf --host 0.0.0.0 --port 8080 --n-gpu-layers 99 -v -ub 1024 -b 4096 --lora lora/lora-adapter-fp16.gguf
And the LoRA model cannot be converted to .gguf
format now as the ggml-org/llama.cpp#9396 haven't be merged:
# As ditto
docker run -it --rm --entrypoint /app/convert_lora_to_gguf.py -v $PWD/MiniCPM3-4B:/models -v $PWD/MiniCPM3-RAG-LoRA:/lora ghcr.io/ggerganov/llama.cpp:full --outtype q8_0 --base /models /lora
It said:
The repository for /models contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//models.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Or could you give us some tips for converting? Thanks a lot!
MiniCPM3 is, de facto, an ideal edge-side LLM for small companies.
Metadata
Metadata
Assignees
Labels
featureNew featuresNew features