🏠 Main Repository | 📚 Full Documentation
Cook up amazing multimodal AI applications effortlessly with MiniCPM-o, bringing vision, speech, and live-streaming capabilities right to your fingertips!
Our comprehensive documentation website presents every recipe in a clear, well-organized manner. All features are displayed at a glance, making it easy for you to quickly find exactly what you need.
We support a wide range of users, from individuals to enterprises and researchers.
- Individuals: Enjoy effortless inference using Ollama and Llama.cpp with minimal setup.
- Enterprises: Achieve high-throughput, scalable performance with vLLM and SGLang.
- Researchers: Leverage advanced frameworks including Transformers , LLaMA-Factory, SWIFT, and Align-anything to enable flexible model development and cutting-edge experimentation.
Our ecosystem delivers optimal solution for a variety of hardware environments and deployment demands.
- Web demo: Launch interactive multimodal AI web demo with FastAPI.
- Quantized deployment: Maximize efficiency and minimize resource consumption using GGUF, BNB, and AWQ.
- Edge devices: Bring powerful AI experiences to iPhone and iPad, supporting offline and privacy-sensitive applications.
Explore real-world examples of MiniCPM-V deployed on edge devices using our curated recipes. These demos highlight the model’s high efficiency and robust performance in practical scenarios.
- Run locally on iPhone with iOS demo.
- Run locally on iPad with iOS demo, observing the process of drawing a rabbit.
ipad_case.mp4
Ready-to-run examples
Recipe | Description |
---|---|
Vision Capabilities | |
🖼️ Single-image QA | Question answering on a single image |
🧩 Multi-image QA | Question answering with multiple images |
🎬 Video QA | Video-based question answering |
📄 Document Parser | Parse and extract content from PDFs and webpages |
📝 Text Recognition | Reliable OCR for photos and screenshots |
Audio Capabilities | |
🎤 Speech-to-Text | Multilingual speech recognition |
🗣️ Text-to-Speech | Instruction-following speech synthesis |
🎭 Voice Cloning | Realistic voice cloning and role-play |
Customize your model with your own ingredients
Data preparation
Follow the guidance to set up your training datasets.
Training
We provide training methods serving different needs as following:
Framework | Description |
---|---|
Transformers | Most flexible for customization |
LLaMA-Factory | Modular fine-tuning toolkit |
SWIFT | Lightweight and fast parameter-efficient tuning |
Align-anything | Visual instruction alignment for multimodal models |
Deploy your model efficiently
Method | Description |
---|---|
vLLM | High-throughput GPU inference |
SGLang | High-throughput GPU inference |
Llama.cpp | Fast CPU inference on PC, iPhone and iPad |
Ollama | User-friendly setup |
OpenWebUI | Interactive Web demo with Open WebUI |
FastAPI | Interactive Omni Streaming demo with FastAPI |
iOS | Interactive iOS demo with llama.cpp |
Compress your model to improve efficiency
Format | Key Feature |
---|---|
GGUF | Simplest and most portable format |
BNB | Simple and easy-to-use quantization method |
AWQ | High-performance quantization for efficient inference |
- text-extract-api: Document extraction API using OCRs and Ollama supported models
- comfyui_LLM_party: Build LLM workflows and integrate into existing image workflows
- Ollama-OCR: OCR package uses vlms through Ollama to extract text from images and PDF
- comfyui-mixlab-nodes: ComfyUI node suite supports Workflow-to-APP、GPT&3D and more
- OpenAvatarChat: Interactive digital human conversation implementation on single PC
- pensieve: A privacy-focused passive recording project by recording screen content
- paperless-gpt: Use LLMs to handle paperless-ngx, AI-powered titles, tags and OCR
- Neuro: A recreation of Neuro-Sama, but running on local models on consumer hardware
We love new recipes! Please share your creative dishes:
- Fork the repository
- Create your recipe
- Submit a pull request
- Found a bug? Open an issue
- Need help? Join our Discord
This cookbook is developed by OpenBMB and OpenSQZ.
This cookbook is served under the Apache-2.0 License - cook freely, share generously! 🍳