Releases: NotPunchnox/rkllama
Releases · NotPunchnox/rkllama
RKLLAMA 0.0.3
🚀 Rkllama Version: 0.0.3
Key Updates
🌐 Extended Compatibility
- Full Model Support: RKLLAMA now fully supports a wide range of models, including DeepSeek, Qwen, Llama, and many others. This ensures broader applicability and flexibility for users. 🤝
⚡ Enhanced Performance
- Tokenized Inputs: Inputs are now tokenized before being sent to the model, replacing raw prompts. This optimization significantly boosts response speed and efficiency. 🏃♂️
📂 Modelfile System
- Automated Initialization: Inspired by Ollama, the new Modelfile system allows automatic initialization of both the tokenizer and chattemplate by simply providing the HuggingFace path. 🔧
- Customizable Parameters: Easily adjust model parameters such as temperature, location, and system prompt for tailored performance. 🎛️
🗂️ Simplified Organization
- Automatic Folder Creation: Models are now organized into dedicated folders, automatically created when you run the
rkllama listcommand. This streamlines model management. 📁 - Effortless Model Launch: Launch models using just the model name, as
.rkllmfiles are directly referenced in the Modelfile. 🛫
🔄 Automatic Modelfile Creation
- Seamless Integration: When using the
pullcommand, the Modelfile is generated automatically. For previously downloaded models, run a one-time command (e.g.,rkllama run modelname file.rkllm huggingface_path) to create the Modelfile. 🔄
🌟 Future Enhancements
- Customization and Optimization: Upcoming updates will introduce further customization options for the chattemplate and additional hyperparameters (e.g., top_k) to enhance user experience. 🎨
📚 Upgrade Guide
- Rebuild Architecture: If you have already downloaded models and wish to avoid reinstalling everything, follow our guide: Rebuild Architecture. 📖
RKLLAMA 0.0.1
Full Changelog: https://github.com/NotPunchnox/rkllama/commits/Release
RKLLAMA 0.0.4
Version 0.0.4 (Current)
Major Features
- Ollama API Compatibility: Added support for the Ollama API interface, allowing RKLLAMA to work with Ollama clients and tools.
- Enhanced Streaming Responses: Improved reliability of streaming responses with better handling of completion signals.
- Optional Debug Mode: Added detailed debugging tools that can be enabled with
--debugflag. - CPU Model Auto-detection: Automatic detection of RK3588 or RK3576 platform with fallback to interactive selection.
New API Endpoints
/api/tags- List all available models (Ollama-compatible)/api/show- Show model information/api/create- Create a new model from a Modelfile/api/pull- Pull a model from Hugging Face/api/delete- Delete a model/api/generate- Generate a completion for a prompt/api/chat- Generate a chat completion/api/embeddings- (Placeholder) Generate embeddings/api/debug- Diagnostic endpoint (available only in debug mode)
Improvements
- More reliable "done" signaling for streaming responses
- Auto-detection of CPU model (RK3588 or RK3576) with fallback to user selection
- Better error handling and error messages
- Fixed threading issues in request processing
- Automatic content formatting for various response types
- Improved stream handling with token tracking
- Optional debugging mode with detailed logs
Technical Changes
- Added new utility modules for debugging and API handling
- Improved thread management for streaming responses
- Added CPU model detection and selection
- Updated server configuration options
- Made debugging tools optional through environment variable and command line flag