Release 25.01 Alpha
Pre-release
Pre-release
·
67 commits
to v0.1.0
since this release
- Model Supported: Llama series and AWQ version
- Application Supported: CLI Chatbot, API Server/Client, Gradio
- Inference Engine: Static and dynamic tree speculative decoding (including GPU-only and offloading)