Awesome local LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Inference platforms

LM Studio - discover, download and run local LLMs
jan - an open source alternative to ChatGPT that runs 100% offline on your computer
LocalAI - the free, open-source alternative to OpenAI, Claude and others
ChatBox - user-friendly desktop client app for AI models/LLMs
lemonade - a local LLM server with GPU and NPU Acceleration

Back to Table of Contents

Inference engines

ollama - get up and running with LLMs
llama.cpp - LLM inference in C/C++
vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
exo - run your own AI cluster at home with everyday devices
BitNet - official inference framework for 1-bit LLMs
sglang - a fast serving framework for large language models and vision language models
Nano-vLLM - a lightweight vLLM implementation built from scratch
koboldcpp - run GGUF models easily with a KoboldAI UI
flashinfer - kernel library for LLM serving
gpustack - simple, scalable AI model deployment on GPU clusters
mlx-lm - generate text and fine-tune large language models on Apple silicon with MLX
distributed-llama - connect home devices into a powerful cluster to accelerate LLM inference
ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
mini-sglang - a lightweight yet high-performance inference framework for Large Language Models
FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
llm-scaler - run LLMs on Intel Arc™ Pro B60 GPUs

Back to Table of Contents

User Interfaces

Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
Lobe Chat - an open-source, modern design AI chat framework
Text generation web UI - LLM UI with advanced features, easy setup, and multiple backend support
SillyTavern - LLM Frontend for Power Users
Page Assist - Use your locally running AI models to assist you in your web browsing

Back to Table of Contents

Large Language Models

Explorers, Benchmarks, Leaderboards

Arena - benchmark & compare the best AI models
AI Models & API Providers Analysis - understand the AI landscape to choose the best model and provider for your use case
SWE-rebench - a continuously evolving and decontaminated benchmark for software engineering LLMs
LLM Explorer - explore list of the open-source LLM models
Dubesor LLM Benchmark table - small-scale manual performance comparison benchmark
oobabooga benchmark - a list sorted by size (on disk) for each score

Back to Table of Contents

Model providers

Qwen - powered by Alibaba Cloud
Mistral AI - a pioneering French artificial intelligence startup
Tencent - a profile of a Chinese multinational technology conglomerate and holding company
Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
bartowski - providing GGUF versions of popular LLMs
Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets

Back to Table of Contents

Specific models

General purpose

Qwen3.5 - a collection of the latest generation Qwen LLMs
Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
Ministral 3 - a collection of edge models, with base, instruct and reasoning variants, in 3 different sizes: 3B, 8B and 14B, all with vision capabilities
Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
Phi-4 - a family of small language, multi-modal and reasoning models from Microsoft
NVIDIA Nemotron v3 - a family of open models from NVIDIA with open weights, training data and recipes, delivering leading efficiency and accuracy for building specialized AI agents
Llama Nemotron - a collection of open, production-ready enterprise models from NVIDIA
OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
GLM-5 - a model targeting complex systems engineering and long-horizon agentic tasks
Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features
Step-3.5-Flash - most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency

Back to Table of Contents

Coding

Qwen3-Coder-Next - a collection of Qwen's open-weight language models designed specifically for coding agents and local development
Devstral 2 - a couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents
GLM-4.7 - a collection of agentic, reasoning and coding (ARC) foundation models
MiniMax-M2 - a collection of SOTA models for real-world dev & agents
NousCoder-14B - a competitive programming model post-trained on Qwen3-14B via reinforcement learning
FrogBoss-32B-2510 & FrogMini-14B-2510 - coding agents specialized in fixing bugs in code obtained by fine‑tuning a Qwen3‑32B and Qwen3‑14B language model, respectively, on debugging trajectories generated by Claude Sonnet 4 within the BugPilot framework
Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
Stable-DiffCoder - a strong code diffusion large language model

Back to Table of Contents

Multimodal

Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen
GLM-4.6V - a collection of open source multimodal models with native tool use from Zhipu AI

Back to Table of Contents

Image

Qwen-Image - a collection of models for image generation, edit and decomposition from Qwen
Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
GLM-Image - an image generation model
HunyuanImage - a collection of image generation models from Tencent
HunyuanVideo - a collection of video generation models from Tencent
Vidi - a collection of models for multimodal video understanding and creation
FastVLM - a collection of VLMs with efficient vision encoding from Apple
MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
LFM2-VL - a colection of vision-language models, designed for on-device deployment
ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale

Back to Table of Contents

Audio

Nemotron Speech - a collection of open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
Qwen3-ASR - a collection of models that support language identification and ASR for 52 languages and dialects
Qwen3-TTS - a collection of TTS models that cover 10 major languages as well as multiple dialectal voice profiles to meet global application needs
Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
Voxtral-Mini-4B-Realtime-2602 - a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms
chatterbox - first production-grade open-source TTS model
VibeVoice - a collection of frontier text-to-speech models from Microsoft
Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis

Back to Table of Contents

Safeguards

gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
Granite Guardian Models - a collection of models created by IBM for safeguarding language models
Qwen3Guard - a collection of safety moderation models built upon Qwen3
NemoGuard - a collection of models from NVIDIA for content safety, topic-following and security guardrails
AprielGuard - a safeguard model designed to detect and mitigate both safety risks and security threats in LLM interactions

Back to Table of Contents

Miscellaneous

Jan-v1-4B - the first release in the Jan Family, designed for agentic reasoning and problem-solving within the Jan App
Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
Nemotron RAG - set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy and extract structured data from complex do
Nemotron-Orchestrator-8B - a state-of-the-art 8B orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools
Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
Waypoint-1 - a collection of control-and-text-conditioned causal diffusion models that can generate worlds in realtime on high-end consumer hardware
Hunyuan3D - a collection of everything related (models, datasets etc.) to 3D assets generation from Tencent
Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments

Back to Table of Contents

Tools

Models

unsloth - fine-tuning & reinforcement learning for LLMs
outlines - structured outputs for LLMs
heretic - fully automatic censorship removal for language models
llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.

Back to Table of Contents

Agent Frameworks

AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
langflow - a powerful tool for building and deploying AI-powered agents and workflows
langchain - build context-aware reasoning applications
autogen - a programming framework for agentic AI
anything-llm - the all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more
Flowise - build AI agents, visually
llama_index - the leading framework for building LLM-powered agents over your data
crewAI - a framework for orchestrating role-playing, autonomous AI agents
agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
sim - open-source platform to build and deploy AI agent workflows
openai-agents-python - a lightweight, powerful framework for multi-agent workflows
SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
camel - the first and the best multi-agent framework
pydantic-ai - a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI
txtai - all-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
agent-framework - a framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET
archgw - a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc.
ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
ragbits - building blocks for rapid development of GenAI applications

Back to Table of Contents

Model Context Protocol

mindsdb - federated query engine for AI - the only MCP Server you'll ever need
github-mcp-server - GitHub's official MCP Server
playwright-mcp - Playwright MCP server
chrome-devtools-mcp - Chrome DevTools for coding agents
n8n-mcp - a MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you
awslabs/mcp - AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP
mcp-atlassian - MCP server for Atlassian tools (Confluence, Jira)
dbhub - zero-dependency, token-efficient database MCP server for Postgres, MySQL, SQL Server, MariaDB, SQLite

Back to Table of Contents

Retrieval-Augmented Generation

pathway - Python ETL framework for stream processing, real-time analytics, LLM pipelines and RAG
graphrag - a modular graph-based RAG system
LightRAG - simple and fast RAG
haystack - AI orchestration framework to build customizable, production-ready LLM applications, best suited for building RAG, question answering, semantic search or conversational agent chatbots
vanna - an open-source Python RAG framework for SQL generation and related functionality
graphiti - build real-time knowledge graphs for AI Agents
onyx - the AI platform connected to your company's docs, apps, and people
claude-context - make entire codebase the context for any coding agent
pipeshub-ai - a fully extensible and explainable workplace AI platform for enterprise search and workflow automation

Back to Table of Contents

Coding Agents

opencode - a AI coding agent built for the terminal
zed - a next-generation code editor designed for high-performance collaboration with humans and AI
OpenHands - a platform for software development agents powered by AI
cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
aider - AI pair programming in your terminal
tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
goose - an open-source, extensible AI agent that goes beyond code suggestions
Roo-Code - a whole dev team of AI agents in your code editor
crush - the glamourous AI coding agent for your favourite terminal
kilocode - open source AI coding assistant for planning, building, and fixing code
humanlayer - the best way to get AI coding agents to solve hard problems in complex codebases
ProxyAI - the leading open-source AI copilot for JetBrains

Back to Table of Contents

Computer Use

open-interpreter - a natural language interface for computers
OmniParser - a simple screen parsing tool towards pure vision based GUI agent
cua - the Docker Container for Computer-Use AI Agents
self-operating-computer - a framework to enable multimodal models to operate a computer
Agent-S - an open agentic framework that uses computers like a human
openwork - an open-source alternative to Claude Cowork, powered by OpenCode

Back to Table of Contents

Browser Automation

puppeteer - a JavaScript API for Chrome and Firefox
playwright - a framework for Web Testing and Automation
browser-use - make websites accessible for AI agents
firecrawl - turn entire websites into LLM-ready markdown or structured data
stagehand - the AI Browser Automation Framework
nanobrowser - open-source Chrome extension for AI-powered web automation

Back to Table of Contents

Memory Management

mem0 - universal memory layer for AI Agents
letta - the stateful agents framework with memory, reasoning, and context management
supermemory - memory engine and app that is extremely fast, scalable
cognee - memory for AI Agents in 5 lines of code
LMCache - supercharge your LLM with the fastest KV Cache Layer
memU - an open-source memory framework for AI companions

Back to Table of Contents

Testing, Evaluation, and Observability

langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
opik - debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards
openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
garak - the LLM vulnerability scanner from NVIDIA
giskard - an open-source evaluation & testing for AI & LLM systems
agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place

Back to Table of Contents

Research

Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
open-notebook - an open-source implementation of Notebook LM with more flexibility and features
RD-Agent - automate the most critical and valuable aspects of the industrial R&D process
local-deep-researcher - fully local web research and report writing assistant
local-deep-research - an AI-powered research assistant for deep, iterative research
maestro - an AI-powered research application designed to streamline complex research tasks

Back to Table of Contents

Training and Fine-tuning

OpenRLHF - an easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM, ZeRO-3 and HuggingFace Transformers, designed to make RLHF training simple and accessible
Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
augmentoolkit - train an open-source LLM on new facts

Back to Table of Contents

Miscellaneous

context7 - up-to-date code documentation for LLMs and AI code editors
deepwiki-open - open source DeepWiki: AI-powered wiki generator for GitHub/Gitlab/Bitbucket repositories
cai - Cybersecurity AI (CAI), the framework for AI Security
speakr - a personal, self-hosted web application designed for transcribing audio recordings
presenton - an open-source AI presentation generator and API
OmniGen2 - exploration to advanced multimodal generation
4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
mobile-use - a powerful, open-source AI agent that controls your Android or IOS device using natural language
gabber - build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs
promptcat - a zero-dependency prompt manager/catalog/library in a single HTML file

Back to Table of Contents

Hardware

Alex Ziskind - tests of pcs, laptops, gpus etc. capable of running LLMs
Digital Spaceport - reviews of various builds designed for LLM inference
JetsonHacks - information about developing on NVIDIA Jetson Development Kits
Miyconst - tests of various types of hardware capable of running LLMs
Kolosal - LLM Memory calculator - estimate the RAM requirements of any GGUF model instantly
LLM Inference VRAM & GPU Requirement Calculator - calculate how many GPUs you need to deploy LLMs
ZLUDA - CUDA on non-NVIDIA GPUs

Back to Table of Contents

Tutorials

Models

Let's reproduce GPT-2 (124M)
nanochat - a full-stack implementation of an LLM like ChatGPT in a single, clean, minimal, hackable, dependency-lite codebase, designed to run on a single 8XH100 node via scripts like speedrun.sh, that run the entire pipeline start to end
Knowledge Distillation: How LLMs train each other
gguf-docs - Docs for GGUF quantization (unofficial)

Back to Table of Contents

Prompt Engineering

Prompt Engineering Guide - guides, papers, lecture, notebooks and resources for prompt engineering
Prompt Engineering by NirDiamant - a comprehensive collection of tutorials and implementations for Prompt Engineering techniques, ranging from fundamental concepts to advanced strategies
Prompting guide 101 - a quick-start handbook for effective prompts by Google
Prompt Engineering by Google - prompt engineering by Google
Prompt Engineering by Anthropic - prompt engineering by Anthropic
Prompt Engineering Interactive Tutorial - Prompt Engineering Interactive Tutorial by Anthropic
Real world prompting - real world prompting tutorial by Anthropic
Prompt evaluations - prompt evaluations course by Anthropic
system-prompts-and-models-of-ai-tools - a collection of system prompts extracted from AI tools
system_prompts_leaks - a collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini
Prompt from Codex - Prompt used to steer behavior of OpenAI's Codex

Back to Table of Contents

Context Engineering

Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems

Back to Table of Contents

Inference

vLLM Production Stack - vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Back to Table of Contents

Agents

superpowers - an agentic skills framework & software development methodology that works
GenAI Agents - tutorials and implementations for various Generative AI Agent techniques
500+ AI Agent Projects - a curated collection of AI agent use cases across various industries
12-Factor Agents - principles for building reliable LLM applications
Agents towards production - end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launches
agents.md - a simple, open format for guiding coding agents
Agent Skills - a simple, open format for giving agents new capabilities and expertise
skills - Hugging Face Skills are definitions for AI/ML tasks like dataset creation, model training and evaluation
LLM Agents & Ecosystem Handbook - one-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools
601 real-world gen AI use cases - 601 real-world gen AI use cases from the world's leading organizations by Google
A practical guide to building agents - a practical guide to building agents by OpenAI

Back to Table of Contents

Retrieval-Augmented Generation

Pathway AI Pipelines - ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data
RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python

Back to Table of Contents

Miscellaneous

Self-hosted AI coding that just works

Back to Table of Contents

Communities

LocalLLaMA
LLMDevs
LocalLLM
LocalAIServers
GenAI monitor - monitoring updates & fresh releases related to LLMs, diffusion models and Generative AI

Back to Table of Contents

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

License

rafska/awesome-local-llm

Folders and files

Latest commit

History

Repository files navigation

Awesome local LLM

Table of Contents

Inference platforms

Inference engines

User Interfaces

Large Language Models

Explorers, Benchmarks, Leaderboards

Model providers

Specific models

General purpose

Coding

Multimodal

Image

Audio

Safeguards

Miscellaneous

Tools

Models

Agent Frameworks

Model Context Protocol

Retrieval-Augmented Generation

Coding Agents

Computer Use

Browser Automation

Memory Management

Testing, Evaluation, and Observability

Research

Training and Fine-tuning

Miscellaneous

Hardware

Tutorials

Models

Prompt Engineering

Context Engineering

Inference

Agents

Retrieval-Augmented Generation

Miscellaneous

Communities

Contributing

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages