12 Feb 00:38

NotPunchnox

RKLLAMA 0.0.3

🚀 Rkllama Version: 0.0.3

Key Updates

🌐 Extended Compatibility

Full Model Support: RKLLAMA now fully supports a wide range of models, including DeepSeek, Qwen, Llama, and many others. This ensures broader applicability and flexibility for users. 🤝

⚡ Enhanced Performance

Tokenized Inputs: Inputs are now tokenized before being sent to the model, replacing raw prompts. This optimization significantly boosts response speed and efficiency. 🏃‍♂️

📂 Modelfile System

Automated Initialization: Inspired by Ollama, the new Modelfile system allows automatic initialization of both the tokenizer and chattemplate by simply providing the HuggingFace path. 🔧
Customizable Parameters: Easily adjust model parameters such as temperature, location, and system prompt for tailored performance. 🎛️

🗂️ Simplified Organization

Automatic Folder Creation: Models are now organized into dedicated folders, automatically created when you run the rkllama list command. This streamlines model management. 📁
Effortless Model Launch: Launch models using just the model name, as .rkllm files are directly referenced in the Modelfile. 🛫

🔄 Automatic Modelfile Creation

Seamless Integration: When using the pull command, the Modelfile is generated automatically. For previously downloaded models, run a one-time command (e.g., rkllama run modelname file.rkllm huggingface_path) to create the Modelfile. 🔄

🌟 Future Enhancements

Customization and Optimization: Upcoming updates will introduce further customization options for the chattemplate and additional hyperparameters (e.g., top_k) to enhance user experience. 🎨

📚 Upgrade Guide

Rebuild Architecture: If you have already downloaded models and wish to avoid reinstalling everything, follow our guide: Rebuild Architecture. 📖

Assets 3

15 Jan 22:14

NotPunchnox

RKLLAMA 0.0.1

Full Changelog: https://github.com/NotPunchnox/rkllama/commits/Release

Assets 3

24 Mar 18:46

NotPunchnox

RKLLAMA 0.0.4 Latest

Latest

Version 0.0.4 (Current)

Major Features

Ollama API Compatibility: Added support for the Ollama API interface, allowing RKLLAMA to work with Ollama clients and tools.
Enhanced Streaming Responses: Improved reliability of streaming responses with better handling of completion signals.
Optional Debug Mode: Added detailed debugging tools that can be enabled with --debug flag.
CPU Model Auto-detection: Automatic detection of RK3588 or RK3576 platform with fallback to interactive selection.

New API Endpoints

/api/tags - List all available models (Ollama-compatible)
/api/show - Show model information
/api/create - Create a new model from a Modelfile
/api/pull - Pull a model from Hugging Face
/api/delete - Delete a model
/api/generate - Generate a completion for a prompt
/api/chat - Generate a chat completion
/api/embeddings - (Placeholder) Generate embeddings
/api/debug - Diagnostic endpoint (available only in debug mode)

Improvements

More reliable "done" signaling for streaming responses
Auto-detection of CPU model (RK3588 or RK3576) with fallback to user selection
Better error handling and error messages
Fixed threading issues in request processing
Automatic content formatting for various response types
Improved stream handling with token tracking
Optional debugging mode with detailed logs

Technical Changes

Added new utility modules for debugging and API handling
Improved thread management for streaming responses
Added CPU model detection and selection
Updated server configuration options
Made debugging tools optional through environment variable and command line flag

Assets 3