Awesome-LLMOps

🎉 An awesome & curated list of best LLMOps tools.

More than welcome to add a new project by simply opening an issue.

Inference

Inference Engine

Cortex.cpp: Local AI API Platform.
DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
llama-box: LM inference server implementation based on *.cpp.
Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework.
ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
llama.cpp: LLM inference in C/C++.
Llumnix: Efficient and easy multi-instance LLM serving.
MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
MLC LLM: Universal LLM Deployment Engine with ML Compilation.
MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
OpenVINO: OpenVINO™ is an open source toolkit for optimizing and deploying AI inference.
Petals: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Ratchet: A cross-platform browser ML framework.
SGLang: SGLang is a fast serving framework for large language models and vision language models.
TinyGrad: You like pytorch? You like micrograd? You love tinygrad! ❤️
transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Text Generation Inference: Large Language Model Text Generation Inference.
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
web-llm: High-performance In-browser LLM Inference Engine.
Xinference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.

Inference Platform

AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration.
Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
llm-d: llm-d is a Kubernetes-native high-performance distributed LLM inference framework
llmaz: ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work!
Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
OME: OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)

Middleware

Checkpoint Engine: Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.

LLM Router

AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.
vLLM Semantic Router: Intelligent Mixture-of-Models Router for Efficient LLM Inference

AI Gateway

agentgateway: Next Generation Agentic Proxy for AI Agents and MCP servers
APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities.
Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
Higress: 🤖 AI Gateway | AI Native API Gateway.
kgateway: The Cloud-Native API Gateway and AI Gateway.
Kong: 🦍 The Cloud-Native API Gateway and AI Gateway.
gateway-api-inference-extension: Gateway API Inference Extension.

Output

Instructor: structured outputs for llms.
Outlines: Structured Text Generation.

Simulator

Vidur: A large-scale simulation framework for LLM inference

Benchmark

genai-bench: Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
Inference Benchmark: A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container.
Inference Perf: GenAI inference performance benchmarking tool

Orchestration

Workflow

Dify: Production-ready platform for agentic workflow development.
FastGPT: FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration.
Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Inference: Turn any computer or edge device into a command center for your computer vision projects.
LangChain: 🦜🔗 Build context-aware reasoning applications.
LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.

Agent Framework

Agent Development Kit (ADK): An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
Agno: Open-source framework for building multi-agent systems with memory, knowledge and reasoning.
autogen: A programming framework for agentic AI
AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
CAMEL: 🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents.
crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows
Flowise: Build AI Agents, Visually
kagent: kagent is a kubernetes native framework for building AI agents.
LangGraph: Build resilient language agents as graphs.
MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
PydanticAI: GenAI Agent Framework, the Pydantic way
Qwen-Agent: Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps.
Suna: Suna - Open Source Generalist AI Agent
Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

RAG

GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system.
LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation"
quivr: Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
RAGFlow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Application Framework

Evidently: Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Langfuse: 🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
lunaary: The production toolkit for LLMs. Observability, prompt management and evaluations.
OpenLIT: Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. 🚀💻 Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs.
phoenix: AI Observability & Evaluation.
PostHog: 🦔 PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free.
Weave: Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.

Runtime

AI Terminal

Gemini CLI: An open-source AI agent that brings the power of Gemini directly into your terminal.
kubectl-ai: AI powered Kubernetes Assistant

AI Agent

goose: an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
Magentic-UI: A research prototype of a human-centered web agent
OpenManus: No fortress, purely open ground. OpenManus is Coming.

Code Agent

aider: aider is AI pair programming in your terminal
Codex: Lightweight coding agent that runs in your terminal
Continue: ⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks.
Open SWE: An Open-Source Asynchronous Coding Agent
SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Tabby: Self-hosted AI coding assistant.

Tool

Browser Use: Make websites accessible for AI agents.
Graphiti: Build Real-Time Knowledge Graphs for AI Agents.
Mem0: The Memory layer for AI Agents.
OpenAI CUA: Computer Using Agent Sample App.

Chatbot

5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers.
AnythingLLM: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
Chat SDK: A full-featured, hackable Next.js AI chatbot built by Vercel
Chatbot UI: AI chat for any model.
Cherry Studio: 🍒 Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1.
FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Gradio: Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
LLM: Access large language models from the command-line
Lobe Chat: 🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
NextChat: ✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
opcode: A powerful GUI app and Toolkit for Claude Code - Create custom agents, manage interactive Claude Code sessions, run secure background agents, and more.
Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.

Database

chroma: the AI-native open-source embedding database.
deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
Faiss: A library for efficient similarity search and clustering of dense vectors.
milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Sandbox

Daytona: Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code.
E2B: Secure open source cloud runtime for AI apps & AI agents.

Evaluation

DeepEval: The LLM Evaluation Framework
ragas: Supercharge Your LLM Application Evaluations 🚀

Observation

OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.

Training

Framework

AXLearn: An Extensible Deep Learning Library
Candle: Minimalist ML framework for Rust.
ColossalAI: Making large AI models cheaper, faster and more accessible.
DLRover: DLRover: An Automatic Distributed Deep Learning System
Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
MaxText: A simple, performant and scalable Jax LLM!
MLX: MLX: An array framework for Apple silicon.

FineTune

Axolotl: Go ahead and axolotl questions.
EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax.
LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
torchtune: PyTorch native post-training library.
Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer.
unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥

Alignment

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.

Evaluation

AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
LiveBench: LiveBench: A Challenging, Contamination-Free LLM Benchmark
lm-evaluation-harness: A framework for few-shot evaluation of language models.
LongBench: LongBench v2 and LongBench (ACL 2024).
MLE-bench: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
opik: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
terminal-bench: A benchmark for LLMs on complicated tasks in the terminal

Workflow

Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
Kubeflow: Machine Learning Toolkit for Kubernetes.
Metaflow: Build, Deploy and Manage AI/ML Systems.
MLflow: Open source platform for the machine learning lifecycle.
Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
ZenML: ZenML 🙏: The bridge between ML and Ops. https://zenml.io.

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
.github		.github
website		website
.gitignore		.gitignore
CNAME		CNAME
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
README.md		README.md
project_request.py		project_request.py
requirements.txt		requirements.txt

Uh oh!

License

InftyAI/Awesome-LLMOps

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLMOps

Table of Contents

Inference

Inference Engine

Inference Platform

Middleware

LLM Router

AI Gateway

Output

Simulator

Benchmark

Orchestration

Workflow

Agent Framework

RAG

Application Framework

Runtime

AI Terminal

AI Agent

Code Agent

Tool

Chatbot

Database

Sandbox

Evaluation

Observation

Training

Framework

FineTune

Alignment

Evaluation

Workflow

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Sponsor this project

Uh oh!

Uh oh!

Contributors 17

Languages