Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
-
Updated
Mar 29, 2026 - Go
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Configs, launchers, benchmarks, and tooling for running Qwen3.5 GGUF models locally with llama.cpp on a 16GB NVIDIA GPU
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Local LLM proxy, DevOps friendly
A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml
FastAPI proxy that strips volatile fields from OpenClaw requests to dramatically improve llama-server KV cache hit rates (~22× faster prompt eval)
CLI wrapper for llama.cpp providing an ollama-like experience
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
This is a Bash script to automatically launch llama-server, detects available .gguf models, and selects GPU layers based on your free VRAM.
A production-grade Python SDK for llama-server that streamlines authentication, token rotation, observability, and PII masking—helping AI architects ship secure, traceable LLM systems with enterprise-ready guardrails.
OpenServ is a simple Bash-based CLI tool for managing LLMs in llama.cpp server.
Create a code completion model & tool for IDEs that can run locally on consumer hardware and rival the performance of commercial products like Cursor.
Lightweight nginx+Lua reverse proxy that routes OpenAI-compatible API requests to multiple llama-server instances by model name. No request body mangling.
Hikma is a minimal GTK4 chat client in Vala for OpenAI‑compatible APIs. It renders messages as plain text, stores settings securely via libsecret, and builds with Meson/Ninja plus a simple Debian packaging flow.
CLI-утилита для управления llama-server на базе Python, для ОС Windows 10/11. Позволяет запускать и переключать LLM, используя только конфигурационный файл (config.ini). Реализован паттерн стратегия, для выбора моделей. Эта утилита разработана в первую очередь для видеокарт AMD (RX 6600 - 6700), не поддерживающих ROCM HIP SDK
Provide tested tools and configs to run Qwen 3.5 GGUF models efficiently on a single 16GB NVIDIA GPU using llama.cpp locally.
FIMpad is a FIM-focused local LLM interface in the form of a tabbed GUI text editor.
Local-first review terminal for deterministic, inspectable, and constrained repository analysis with local LLM backends.
Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.
To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."