Skip to content

Releases: NotPunchnox/rkllama

RKLLAMA 0.0.3

12 Feb 00:38

Choose a tag to compare

🚀 Rkllama Version: 0.0.3

Key Updates

🌐 Extended Compatibility

  • Full Model Support: RKLLAMA now fully supports a wide range of models, including DeepSeek, Qwen, Llama, and many others. This ensures broader applicability and flexibility for users. 🤝

⚡ Enhanced Performance

  • Tokenized Inputs: Inputs are now tokenized before being sent to the model, replacing raw prompts. This optimization significantly boosts response speed and efficiency. 🏃‍♂️

📂 Modelfile System

  • Automated Initialization: Inspired by Ollama, the new Modelfile system allows automatic initialization of both the tokenizer and chattemplate by simply providing the HuggingFace path. 🔧
  • Customizable Parameters: Easily adjust model parameters such as temperature, location, and system prompt for tailored performance. 🎛️

🗂️ Simplified Organization

  • Automatic Folder Creation: Models are now organized into dedicated folders, automatically created when you run the rkllama list command. This streamlines model management. 📁
  • Effortless Model Launch: Launch models using just the model name, as .rkllm files are directly referenced in the Modelfile. 🛫

🔄 Automatic Modelfile Creation

  • Seamless Integration: When using the pull command, the Modelfile is generated automatically. For previously downloaded models, run a one-time command (e.g., rkllama run modelname file.rkllm huggingface_path) to create the Modelfile. 🔄

🌟 Future Enhancements

  • Customization and Optimization: Upcoming updates will introduce further customization options for the chattemplate and additional hyperparameters (e.g., top_k) to enhance user experience. 🎨

📚 Upgrade Guide

  • Rebuild Architecture: If you have already downloaded models and wish to avoid reinstalling everything, follow our guide: Rebuild Architecture. 📖

RKLLAMA 0.0.1

15 Jan 22:14
0fc3291

Choose a tag to compare

RKLLAMA 0.0.4

24 Mar 18:46
0e35e79

Choose a tag to compare

Version 0.0.4 (Current)

Major Features

  • Ollama API Compatibility: Added support for the Ollama API interface, allowing RKLLAMA to work with Ollama clients and tools.
  • Enhanced Streaming Responses: Improved reliability of streaming responses with better handling of completion signals.
  • Optional Debug Mode: Added detailed debugging tools that can be enabled with --debug flag.
  • CPU Model Auto-detection: Automatic detection of RK3588 or RK3576 platform with fallback to interactive selection.

New API Endpoints

  • /api/tags - List all available models (Ollama-compatible)
  • /api/show - Show model information
  • /api/create - Create a new model from a Modelfile
  • /api/pull - Pull a model from Hugging Face
  • /api/delete - Delete a model
  • /api/generate - Generate a completion for a prompt
  • /api/chat - Generate a chat completion
  • /api/embeddings - (Placeholder) Generate embeddings
  • /api/debug - Diagnostic endpoint (available only in debug mode)

Improvements

  • More reliable "done" signaling for streaming responses
  • Auto-detection of CPU model (RK3588 or RK3576) with fallback to user selection
  • Better error handling and error messages
  • Fixed threading issues in request processing
  • Automatic content formatting for various response types
  • Improved stream handling with token tracking
  • Optional debugging mode with detailed logs

Technical Changes

  • Added new utility modules for debugging and API handling
  • Improved thread management for streaming responses
  • Added CPU model detection and selection
  • Updated server configuration options
  • Made debugging tools optional through environment variable and command line flag