Skip to content

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

License

Notifications You must be signed in to change notification settings

rafska/awesome-local-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

217 Commits
 
 
 
 
 
 

Repository files navigation

Awesome local LLM

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Table of Contents

Inference platforms

  • LM Studio - discover, download and run local LLMs
  • jan - an open source alternative to ChatGPT that runs 100% offline on your computer
  • LocalAI - the free, open-source alternative to OpenAI, Claude and others
  • ChatBox - user-friendly desktop client app for AI models/LLMs
  • lemonade - a local LLM server with GPU and NPU Acceleration

Back to Table of Contents

Inference engines

  • ollama - get up and running with LLMs
  • llama.cpp - LLM inference in C/C++
  • vllm - a high-throughput and memory-efficient inference and serving engine for LLMs
  • exo - run your own AI cluster at home with everyday devices
  • BitNet - official inference framework for 1-bit LLMs
  • sglang - a fast serving framework for large language models and vision language models
  • Nano-vLLM - a lightweight vLLM implementation built from scratch
  • koboldcpp - run GGUF models easily with a KoboldAI UI
  • flashinfer - kernel library for LLM serving
  • gpustack - simple, scalable AI model deployment on GPU clusters
  • mlx-lm - generate text and fine-tune large language models on Apple silicon with MLX
  • distributed-llama - connect home devices into a powerful cluster to accelerate LLM inference
  • ik_llama.cpp - llama.cpp fork with additional SOTA quants and improved performance
  • mini-sglang - a lightweight yet high-performance inference framework for Large Language Models
  • FastFlowLM - run LLMs on AMD Ryzen™ AI NPUs
  • vllm-gfx906 - vLLM for AMD gfx906 GPUs, e.g. Radeon VII / MI50 / MI60
  • llm-scaler - run LLMs on Intel Arc™ Pro B60 GPUs

Back to Table of Contents

User Interfaces

  • Open WebUI - User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
  • Lobe Chat - an open-source, modern design AI chat framework
  • Text generation web UI - LLM UI with advanced features, easy setup, and multiple backend support
  • SillyTavern - LLM Frontend for Power Users
  • Page Assist - Use your locally running AI models to assist you in your web browsing

Back to Table of Contents

Large Language Models

Explorers, Benchmarks, Leaderboards

Back to Table of Contents

Model providers

  • Qwen - powered by Alibaba Cloud
  • Mistral AI - a pioneering French artificial intelligence startup
  • Tencent - a profile of a Chinese multinational technology conglomerate and holding company
  • Unsloth AI - focusing on making AI more accessible to everyone (GGUFs etc.)
  • bartowski - providing GGUF versions of popular LLMs
  • Beijing Academy of Artificial Intelligence - a private non-profit organization engaged in AI research and development
  • Open Thoughts - a team of researchers and engineers curating the best open reasoning datasets

Back to Table of Contents

Specific models

General purpose

  • Qwen3.5 - a collection of the latest generation Qwen LLMs
  • Gemma 3 - a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models
  • gpt-oss - a collection of open-weight models from OpenAI, designed for powerful reasoning, agentic tasks, and versatile developer use cases
  • Ministral 3 - a collection of edge models, with base, instruct and reasoning variants, in 3 different sizes: 3B, 8B and 14B, all with vision capabilities
  • Hunyuan - a collection of Tencent's open-source efficient LLMs designed for versatile deployment across diverse computational environments
  • Phi-4 - a family of small language, multi-modal and reasoning models from Microsoft
  • NVIDIA Nemotron v3 - a family of open models from NVIDIA with open weights, training data and recipes, delivering leading efficiency and accuracy for building specialized AI agents
  • Llama Nemotron - a collection of open, production-ready enterprise models from NVIDIA
  • OpenReasoning-Nemotron - a collection of models from NVIDIA, trained on 5M reasoning traces for math, code and science
  • GLM-5 - a model targeting complex systems engineering and long-horizon agentic tasks
  • Granite 4.0 - a collection of lightweight, state-of-the-art open foundation models from IBM that natively support multilingual capabilities, a wide range of coding tasks—including fill-in-the-middle (FIM) code completion—retrieval-augmented generation (RAG), tool usage and structured JSON output
  • EXAONE-4.0 - a collection of LLMs from LG AI Research, integrating non-reasoning and reasoning modes
  • ERNIE 4.5 - a collection of large-scale multimodal models from Baidu
  • Seed-OSS - a collection of LLMs developed by ByteDance's Seed Team, designed for powerful long-context, reasoning, agent and general capabilities, and versatile developer-friendly features
  • Step-3.5-Flash - most capable open-source foundation model, engineered to deliver frontier reasoning and agentic capabilities with exceptional efficiency

Back to Table of Contents

Coding

  • Qwen3-Coder-Next - a collection of Qwen's open-weight language models designed specifically for coding agents and local development
  • Devstral 2 - a couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents
  • GLM-4.7 - a collection of agentic, reasoning and coding (ARC) foundation models
  • MiniMax-M2 - a collection of SOTA models for real-world dev & agents
  • NousCoder-14B - a competitive programming model post-trained on Qwen3-14B via reinforcement learning
  • FrogBoss-32B-2510 & FrogMini-14B-2510 - coding agents specialized in fixing bugs in code obtained by fine‑tuning a Qwen3‑32B and Qwen3‑14B language model, respectively, on debugging trajectories generated by Claude Sonnet 4 within the BugPilot framework
  • Mellum-4b-base - an LLM from JetBrains, optimized for code-related tasks
  • Stable-DiffCoder - a strong code diffusion large language model

Back to Table of Contents

Multimodal

  • Qwen3-Omni - a collection of the natively end-to-end multilingual omni-modal foundation models from Qwen
  • GLM-4.6V - a collection of open source multimodal models with native tool use from Zhipu AI

Back to Table of Contents

Image

  • Qwen-Image - a collection of models for image generation, edit and decomposition from Qwen
  • Qwen3-VL - a collection of the most powerful vision-language models in the Qwen series to date
  • GLM-Image - an image generation model
  • HunyuanImage - a collection of image generation models from Tencent
  • HunyuanVideo - a collection of video generation models from Tencent
  • Vidi - a collection of models for multimodal video understanding and creation
  • FastVLM - a collection of VLMs with efficient vision encoding from Apple
  • MiniCPM-V-4_5 - a GPT-4o Level MLLM for single image, multi image and high-FPS video understanding on your phone
  • LFM2-VL - a colection of vision-language models, designed for on-device deployment
  • ClipTagger-12b - a vision-language model (VLM) designed for video understanding at massive scale

Back to Table of Contents

Audio

  • Nemotron Speech - a collection of open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S
  • Qwen3-ASR - a collection of models that support language identification and ASR for 52 languages and dialects
  • Qwen3-TTS - a collection of TTS models that cover 10 major languages as well as multiple dialectal voice profiles to meet global application needs
  • Voxtral-Small-24B-2507 - an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance
  • Voxtral-Mini-4B-Realtime-2602 - a multilingual, realtime speech-transcription model and among the first open-source solutions to achieve accuracy comparable to offline systems with a delay of <500ms
  • chatterbox - first production-grade open-source TTS model
  • VibeVoice - a collection of frontier text-to-speech models from Microsoft
  • Kitten TTS - a collection of open-source realistic text-to-speech models designed for lightweight deployment and high-quality voice synthesis

Back to Table of Contents

Safeguards

  • gpt-oss-safeguard - a collection of safety reasoning models built-upon gpt-oss
  • Granite Guardian Models - a collection of models created by IBM for safeguarding language models
  • Qwen3Guard - a collection of safety moderation models built upon Qwen3
  • NemoGuard - a collection of models from NVIDIA for content safety, topic-following and security guardrails
  • AprielGuard - a safeguard model designed to detect and mitigate both safety risks and security threats in LLM interactions

Back to Table of Contents

Miscellaneous

  • Jan-v1-4B - the first release in the Jan Family, designed for agentic reasoning and problem-solving within the Jan App
  • Jan-nano - a compact 4-billion parameter language model specifically designed and trained for deep research tasks
  • Jan-nano-128k - an enhanced version of Jan-nano features a native 128k context window that enables deeper, more comprehensive research capabilities without the performance degradation typically associated with context extension method
  • Nemotron RAG - set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy and extract structured data from complex do
  • Nemotron-Orchestrator-8B - a state-of-the-art 8B orchestration model designed to solve complex, multi-turn agentic tasks by coordinating a diverse set of expert models and tools
  • Arch-Router-1.5B - the fastest LLM router model that aligns to subjective usage preferences
  • Waypoint-1 - a collection of control-and-text-conditioned causal diffusion models that can generate worlds in realtime on high-end consumer hardware
  • Hunyuan3D - a collection of everything related (models, datasets etc.) to 3D assets generation from Tencent
  • Hunyuan-GameCraft-1.0 - a novel framework for high-dynamic interactive video generation in game environments

Back to Table of Contents

Tools

Models

  • unsloth - fine-tuning & reinforcement learning for LLMs
  • outlines - structured outputs for LLMs
  • heretic - fully automatic censorship removal for language models
  • llama-swap - reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc.

Back to Table of Contents

Agent Frameworks

  • AutoGPT - a powerful platform that allows you to create, deploy, and manage continuous AI agents that automate complex workflows
  • langflow - a powerful tool for building and deploying AI-powered agents and workflows
  • langchain - build context-aware reasoning applications
  • autogen - a programming framework for agentic AI
  • anything-llm - the all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more
  • Flowise - build AI agents, visually
  • llama_index - the leading framework for building LLM-powered agents over your data
  • crewAI - a framework for orchestrating role-playing, autonomous AI agents
  • agno - a full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning
  • sim - open-source platform to build and deploy AI agent workflows
  • openai-agents-python - a lightweight, powerful framework for multi-agent workflows
  • SuperAGI - an open-source framework to build, manage and run useful Autonomous AI Agents
  • camel - the first and the best multi-agent framework
  • pydantic-ai - a Python agent framework designed to help you quickly, confidently, and painlessly build production grade applications and workflows with Generative AI
  • txtai - all-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows
  • agent-framework - a framework for building, orchestrating and deploying AI agents and multi-agent workflows with support for Python and .NET
  • archgw - a high-performance proxy server that handles the low-level work in building agents: like applying guardrails, routing prompts to the right agent, and unifying access to LLMs, etc.
  • ClaraVerse - privacy-first, fully local AI workspace with Ollama LLM chat, tool calling, agent builder, Stable Diffusion, and embedded n8n-style automation
  • ragbits - building blocks for rapid development of GenAI applications

Back to Table of Contents

Model Context Protocol

  • mindsdb - federated query engine for AI - the only MCP Server you'll ever need
  • github-mcp-server - GitHub's official MCP Server
  • playwright-mcp - Playwright MCP server
  • chrome-devtools-mcp - Chrome DevTools for coding agents
  • n8n-mcp - a MCP for Claude Desktop / Claude Code / Windsurf / Cursor to build n8n workflows for you
  • awslabs/mcp - AWS MCP Servers — helping you get the most out of AWS, wherever you use MCP
  • mcp-atlassian - MCP server for Atlassian tools (Confluence, Jira)
  • dbhub - zero-dependency, token-efficient database MCP server for Postgres, MySQL, SQL Server, MariaDB, SQLite

Back to Table of Contents

Retrieval-Augmented Generation

  • pathway - Python ETL framework for stream processing, real-time analytics, LLM pipelines and RAG
  • graphrag - a modular graph-based RAG system
  • LightRAG - simple and fast RAG
  • haystack - AI orchestration framework to build customizable, production-ready LLM applications, best suited for building RAG, question answering, semantic search or conversational agent chatbots
  • vanna - an open-source Python RAG framework for SQL generation and related functionality
  • graphiti - build real-time knowledge graphs for AI Agents
  • onyx - the AI platform connected to your company's docs, apps, and people
  • claude-context - make entire codebase the context for any coding agent
  • pipeshub-ai - a fully extensible and explainable workplace AI platform for enterprise search and workflow automation

Back to Table of Contents

Coding Agents

  • opencode - a AI coding agent built for the terminal
  • zed - a next-generation code editor designed for high-performance collaboration with humans and AI
  • OpenHands - a platform for software development agents powered by AI
  • cline - autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way
  • aider - AI pair programming in your terminal
  • tabby - an open-source GitHub Copilot alternative, set up your own LLM-powered code completion server
  • continue - create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks
  • void - an open-source Cursor alternative, use AI agents on your codebase, checkpoint and visualize changes, and bring any model or host locally
  • goose - an open-source, extensible AI agent that goes beyond code suggestions
  • Roo-Code - a whole dev team of AI agents in your code editor
  • crush - the glamourous AI coding agent for your favourite terminal
  • kilocode - open source AI coding assistant for planning, building, and fixing code
  • humanlayer - the best way to get AI coding agents to solve hard problems in complex codebases
  • ProxyAI - the leading open-source AI copilot for JetBrains

Back to Table of Contents

Computer Use

  • open-interpreter - a natural language interface for computers
  • OmniParser - a simple screen parsing tool towards pure vision based GUI agent
  • cua - the Docker Container for Computer-Use AI Agents
  • self-operating-computer - a framework to enable multimodal models to operate a computer
  • Agent-S - an open agentic framework that uses computers like a human
  • openwork - an open-source alternative to Claude Cowork, powered by OpenCode

Back to Table of Contents

Browser Automation

  • puppeteer - a JavaScript API for Chrome and Firefox
  • playwright - a framework for Web Testing and Automation
  • browser-use - make websites accessible for AI agents
  • firecrawl - turn entire websites into LLM-ready markdown or structured data
  • stagehand - the AI Browser Automation Framework
  • nanobrowser - open-source Chrome extension for AI-powered web automation

Back to Table of Contents

Memory Management

  • mem0 - universal memory layer for AI Agents
  • letta - the stateful agents framework with memory, reasoning, and context management
  • supermemory - memory engine and app that is extremely fast, scalable
  • cognee - memory for AI Agents in 5 lines of code
  • LMCache - supercharge your LLM with the fastest KV Cache Layer
  • memU - an open-source memory framework for AI companions

Back to Table of Contents

Testing, Evaluation, and Observability

  • langfuse - an open-source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more
  • opik - debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards
  • openllmetry - an open-source observability for your LLM application, based on OpenTelemetry
  • garak - the LLM vulnerability scanner from NVIDIA
  • giskard - an open-source evaluation & testing for AI & LLM systems
  • agenta - an open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place

Back to Table of Contents

Research

  • Perplexica - an open-source alternative to Perplexity AI, the AI-powered search engine
  • gpt-researcher - an LLM based autonomous agent that conducts deep local and web research on any topic and generates a long report with citations
  • SurfSense - an open-source alternative to NotebookLM / Perplexity / Glean
  • open-notebook - an open-source implementation of Notebook LM with more flexibility and features
  • RD-Agent - automate the most critical and valuable aspects of the industrial R&D process
  • local-deep-researcher - fully local web research and report writing assistant
  • local-deep-research - an AI-powered research assistant for deep, iterative research
  • maestro - an AI-powered research application designed to streamline complex research tasks

Back to Table of Contents

Training and Fine-tuning

  • OpenRLHF - an easy-to-use, high-performance open-source RLHF framework built on Ray, vLLM, ZeRO-3 and HuggingFace Transformers, designed to make RLHF training simple and accessible
  • Kiln - the easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets
  • augmentoolkit - train an open-source LLM on new facts

Back to Table of Contents

Miscellaneous

  • context7 - up-to-date code documentation for LLMs and AI code editors
  • deepwiki-open - open source DeepWiki: AI-powered wiki generator for GitHub/Gitlab/Bitbucket repositories
  • cai - Cybersecurity AI (CAI), the framework for AI Security
  • speakr - a personal, self-hosted web application designed for transcribing audio recordings
  • presenton - an open-source AI presentation generator and API
  • OmniGen2 - exploration to advanced multimodal generation
  • 4o-ghibli-at-home - a powerful, self-hosted AI photo stylizer built for performance and privacy
  • Observer - local open-source micro-agents that observe, log and react, all while keeping your data private and secure
  • mobile-use - a powerful, open-source AI agent that controls your Android or IOS device using natural language
  • gabber - build AI applications that can see, hear, and speak using your screens, microphones, and cameras as inputs
  • promptcat - a zero-dependency prompt manager/catalog/library in a single HTML file

Back to Table of Contents

Hardware

Back to Table of Contents

Tutorials

Models

Back to Table of Contents

Prompt Engineering

Back to Table of Contents

Context Engineering

  • Context-Engineering - a frontier, first-principles handbook inspired by Karpathy and 3Blue1Brown for moving beyond prompt engineering to the wider discipline of context design, orchestration, and optimization
  • Awesome-Context-Engineering - a comprehensive survey on Context Engineering: from prompt engineering to production-grade AI systems

Back to Table of Contents

Inference

  • vLLM Production Stack - vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Back to Table of Contents

Agents

  • superpowers - an agentic skills framework & software development methodology that works
  • GenAI Agents - tutorials and implementations for various Generative AI Agent techniques
  • 500+ AI Agent Projects - a curated collection of AI agent use cases across various industries
  • 12-Factor Agents - principles for building reliable LLM applications
  • Agents towards production - end-to-end, code-first tutorials covering every layer of production-grade GenAI agents, guiding you from spark to scale with proven patterns and reusable blueprints for real-world launches
  • agents.md - a simple, open format for guiding coding agents
  • Agent Skills - a simple, open format for giving agents new capabilities and expertise
  • skills - Hugging Face Skills are definitions for AI/ML tasks like dataset creation, model training and evaluation
  • LLM Agents & Ecosystem Handbook - one-stop handbook for building, deploying, and understanding LLM agents with 60+ skeletons, tutorials, ecosystem guides, and evaluation tools
  • 601 real-world gen AI use cases - 601 real-world gen AI use cases from the world's leading organizations by Google
  • A practical guide to building agents - a practical guide to building agents by OpenAI

Back to Table of Contents

Retrieval-Augmented Generation

  • Pathway AI Pipelines - ready-to-run cloud templates for RAG, AI pipelines, and enterprise search with live data
  • RAG Techniques - various advanced techniques for Retrieval-Augmented Generation (RAG) systems
  • Controllable RAG Agent - an advanced Retrieval-Augmented Generation (RAG) solution for complex question answering that uses sophisticated graph based algorithm to handle the tasks
  • LangChain RAG Cookbook - a collection of modular RAG techniques, implemented in LangChain + Python

Back to Table of Contents

Miscellaneous

Back to Table of Contents

Communities

Back to Table of Contents

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to get started.

About

A curated list of awesome platforms, tools, practices and resources that helps run LLMs locally

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages