Skip to content

InftyAI/Awesome-LLMOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Awesome-LLMOps Awesome

πŸŽ‰ An awesome & curated list of best LLMOps tools.

More than welcome to add a new project by simply opening an issue.

Table of Contents

Inference

Inference Engine

  • Cortex.cpp: Local AI API Platform. Stars Contributors LastCommit
  • DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. Stars Contributors LastCommit
  • llama-box: LM inference server implementation based on *.cpp. Stars Contributors LastCommit
  • Nvidia Dynamo: A Datacenter Scale Distributed Inference Serving Framework. Stars Contributors LastCommit
  • ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc. Stars Contributors LastCommit
  • LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs. Stars Contributors LastCommit
  • LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs. Stars Contributors LastCommit Tag
  • llama.cpp: LLM inference in C/C++. Stars Contributors LastCommit
  • Llumnix: Efficient and easy multi-instance LLM serving. Stars Contributors LastCommit
  • MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy. Stars Contributors LastCommit Tag
  • MLC LLM: Universal LLM Deployment Engine with ML Compilation. Stars Contributors LastCommit
  • MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more. Stars Contributors LastCommit
  • Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models. Stars Contributors LastCommit
  • OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud. Stars Contributors LastCommit
  • OpenVINO: OpenVINOβ„’ is an open source toolkit for optimizing and deploying AI inference. Stars Contributors LastCommit
  • Petals: 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading Stars Contributors LastCommit
  • Ratchet: A cross-platform browser ML framework. Stars Contributors LastCommit Tag
  • SGLang: SGLang is a fast serving framework for large language models and vision language models. Stars Contributors LastCommit
  • TinyGrad: You like pytorch? You like micrograd? You love tinygrad! ❀️ Stars Contributors LastCommit
  • transformers.js: State-of-the-art Machine Learning for the web. Run πŸ€— Transformers directly in your browser, with no need for a server! Stars Contributors LastCommit Tag
  • Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution. Stars Contributors LastCommit
  • Text Generation Inference: Large Language Model Text Generation Inference. Stars Contributors LastCommit
  • vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs. Stars Contributors LastCommit
  • web-llm: High-performance In-browser LLM Inference Engine. Stars Contributors LastCommit Tag
  • Xinference: Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop. Stars Contributors LastCommit
  • zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild. Stars Contributors LastCommit

Inference Platform

  • AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference. Stars Contributors LastCommit
  • BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more! Stars Contributors LastCommit
  • Kaito: Kubernetes operator for large-model inference and fine-tuning, with GPU auto-provisioning, container-based hosting, and CRD-based orchestration. Stars Contributors LastCommit
  • Kserve: Standardized Serverless ML Inference Platform on Kubernetes. Stars Contributors LastCommit
  • KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text. Stars Contributors LastCommit
  • llm-d: llm-d is a Kubernetes-native high-performance distributed LLM inference framework Stars Contributors LastCommit
  • llmaz: ☸️ Easy, advanced inference platform for large language models on Kubernetes. 🌟 Star to support our work! Stars Contributors LastCommit
  • Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. Stars Contributors LastCommit
  • OME: OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs) Stars Contributors LastCommit

Middleware

  • Checkpoint Engine: Checkpoint-engine is a simple middleware to update model weights in LLM inference engines Stars Contributors LastCommit
  • LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations. Stars Contributors LastCommit Tag

LLM Router

  • AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API. Stars Contributors LastCommit
  • LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]. Stars Contributors LastCommit
  • RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality. Stars Contributors LastCommit
  • vLLM Semantic Router: Intelligent Mixture-of-Models Router for Efficient LLM Inference Stars Contributors LastCommit

AI Gateway

  • agentgateway: Next Generation Agentic Proxy for AI Agents and MCP servers Stars Contributors LastCommit
  • APISIX: The Cloud-Native API Gateway and AI Gateway with extensive plugin system and AI capabilities. Stars Contributors LastCommit
  • Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services. Stars Contributors LastCommit
  • Higress: πŸ€– AI Gateway | AI Native API Gateway. Stars Contributors LastCommit
  • kgateway: The Cloud-Native API Gateway and AI Gateway. Stars Contributors LastCommit
  • Kong: 🦍 The Cloud-Native API Gateway and AI Gateway. Stars Contributors LastCommit
  • gateway-api-inference-extension: Gateway API Inference Extension. Stars Contributors LastCommit

Output

Simulator

  • Vidur: A large-scale simulation framework for LLM inference Stars Contributors LastCommit

Benchmark

  • genai-bench: Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems. Stars Contributors LastCommit
  • Inference Benchmark: A model server agnostic inference benchmarking tool that can be used to benchmark LLMs running on differet infrastructure like GPU and TPU. It can also be run on a GKE cluster as a container. Stars Contributors LastCommit
  • Inference Perf: GenAI inference performance benchmarking tool Stars Contributors LastCommit

Orchestration

Workflow

  • Dify: Production-ready platform for agentic workflow development. Stars Contributors LastCommit
  • FastGPT: FastGPT is a knowledge-based platform built on the LLMs, offers a comprehensive suite of out-of-the-box capabilities such as data processing, RAG retrieval, and visual AI workflow orchestration, letting you easily develop and deploy complex question-answering systems without the need for extensive setup or configuration. Stars Contributors LastCommit
  • Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots. Stars Contributors LastCommit
  • Inference: Turn any computer or edge device into a command center for your computer vision projects. Stars Contributors LastCommit Tag
  • LangChain: πŸ¦œπŸ”— Build context-aware reasoning applications. Stars Contributors LastCommit
  • LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data. Stars Contributors LastCommit

Agent Framework

  • Agent Development Kit (ADK): An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. Stars Contributors LastCommit
  • Agno: Open-source framework for building multi-agent systems with memory, knowledge and reasoning. Stars Contributors LastCommit
  • autogen: A programming framework for agentic AI Stars Contributors LastCommit
  • AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. Stars Contributors LastCommit
  • CAMEL: 🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. Stars Contributors LastCommit
  • crewAI: Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. Stars Contributors LastCommit
  • fast-agent: Define, Prompt and Test MCP enabled Agents and Workflows Stars Contributors LastCommit
  • Flowise: Build AI Agents, Visually Stars Contributors LastCommit
  • kagent: kagent is a kubernetes native framework for building AI agents. Stars Contributors LastCommit Tag
  • LangGraph: Build resilient language agents as graphs. Stars Contributors LastCommit
  • MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming. Stars Contributors LastCommit
  • OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows. Stars Contributors LastCommit
  • PydanticAI: GenAI Agent Framework, the Pydantic way Stars Contributors LastCommit
  • Qwen-Agent: Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc. Stars Contributors LastCommit
  • Semantic Kernel: Integrate cutting-edge LLM technology quickly and easily into your apps. Stars Contributors LastCommit
  • Suna: Suna - Open Source Generalist AI Agent Stars Contributors LastCommit
  • Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team. Stars Contributors LastCommit Tag

RAG

  • GraphRAG: A modular graph-based Retrieval-Augmented Generation (RAG) system. Stars Contributors LastCommit
  • LightRAG: "LightRAG: Simple and Fast Retrieval-Augmented Generation" Stars Contributors LastCommit
  • quivr: Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want. Stars Contributors LastCommit
  • RAGFlow: RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. Stars Contributors LastCommit

Application Framework

  • Evidently: Evidently is ​​an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics. Stars Contributors LastCommit
  • Langfuse: πŸͺ’ Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23 Stars Contributors LastCommit
  • Helicone: 🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 πŸ“ Stars Contributors LastCommit
  • lunaary: The production toolkit for LLMs. Observability, prompt management and evaluations. Stars Contributors LastCommit
  • OpenLIT: Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Management, Vault, Playground. πŸš€πŸ’» Integrates with 50+ LLM Providers, VectorDBs, Agent Frameworks and GPUs. Stars Contributors LastCommit
  • phoenix: AI Observability & Evaluation. Stars Contributors LastCommit
  • PostHog: πŸ¦” PostHog provides open-source web & product analytics, session recording, feature flagging and A/B testing that you can self-host. Get started - free. Stars Contributors LastCommit
  • Weave: Weave is a toolkit for developing AI-powered applications, built by Weights & Biases. Stars Contributors LastCommit

Runtime

AI Terminal

  • Gemini CLI: An open-source AI agent that brings the power of Gemini directly into your terminal. Stars Contributors LastCommit
  • kubectl-ai: AI powered Kubernetes Assistant Stars Contributors LastCommit

AI Agent

  • goose: an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM Stars Contributors LastCommit
  • Magentic-UI: A research prototype of a human-centered web agent Stars Contributors LastCommit
  • OpenManus: No fortress, purely open ground. OpenManus is Coming. Stars Contributors LastCommit

Code Agent

  • aider: aider is AI pair programming in your terminal Stars Contributors LastCommit
  • Codex: Lightweight coding agent that runs in your terminal Stars Contributors LastCommit Tag
  • Continue: ⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks. Stars Contributors LastCommit
  • Open SWE: An Open-Source Asynchronous Coding Agent Stars Contributors LastCommit
  • SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024] Stars Contributors LastCommit
  • Tabby: Self-hosted AI coding assistant. Stars Contributors LastCommit

Tool

  • Browser Use: Make websites accessible for AI agents. Stars Contributors LastCommit
  • Graphiti: Build Real-Time Knowledge Graphs for AI Agents. Stars Contributors LastCommit
  • Mem0: The Memory layer for AI Agents. Stars Contributors LastCommit
  • OpenAI CUA: Computer Using Agent Sample App. Stars Contributors LastCommit

Chatbot

  • 5ire: 5ire is a cross-platform desktop AI assistant, MCP client. It compatible with major service providers, supports local knowledge base and tools via model context protocol servers. Stars Contributors LastCommit
  • AnythingLLM: The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more. Stars Contributors LastCommit
  • Chat SDK: A full-featured, hackable Next.js AI chatbot built by Vercel Stars Contributors LastCommit
  • Chatbot UI: AI chat for any model. Stars Contributors LastCommit
  • Cherry Studio: πŸ’ Cherry Studio is a desktop client that supports for multiple LLM providers. Support deepseek-r1. Stars Contributors LastCommit
  • FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. Stars Contributors LastCommit
  • Gradio: Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work! Stars Contributors LastCommit
  • Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Stars Contributors LastCommit
  • LLM: Access large language models from the command-line Stars Contributors LastCommit
  • Lobe Chat: 🀯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application. Stars Contributors LastCommit
  • NextChat: ✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows. Stars Contributors LastCommit
  • opcode: A powerful GUI app and Toolkit for Claude Code - Create custom agents, manage interactive Claude Code sessions, run secure background agents, and more. Stars Contributors LastCommit
  • Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...). Stars Contributors LastCommit
  • PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks. Stars Contributors LastCommit

Database

  • chroma: the AI-native open-source embedding database. Stars Contributors LastCommit
  • deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. Stars Contributors LastCommit
  • Faiss: A library for efficient similarity search and clustering of dense vectors. Stars Contributors LastCommit
  • milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search. Stars Contributors LastCommit
  • weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​. Stars Contributors LastCommit

Sandbox

  • Daytona: Daytona is a Secure and Elastic Infrastructure for Running AI-Generated Code. Stars Contributors LastCommit
  • E2B: Secure open source cloud runtime for AI apps & AI agents. Stars Contributors LastCommit

Evaluation

  • DeepEval: The LLM Evaluation Framework Stars Contributors LastCommit
  • ragas: Supercharge Your LLM Application Evaluations πŸš€ Stars Contributors LastCommit

Observation

  • OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry. Stars Contributors LastCommit
  • wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production. Stars Contributors LastCommit

Training

Framework

  • AXLearn: An Extensible Deep Learning Library Stars Contributors LastCommit
  • Candle: Minimalist ML framework for Rust. Stars Contributors LastCommit
  • ColossalAI: Making large AI models cheaper, faster and more accessible. Stars Contributors LastCommit
  • DLRover: DLRover: An Automatic Distributed Deep Learning System Stars Contributors LastCommit
  • Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models. Stars Contributors LastCommit
  • MaxText: A simple, performant and scalable Jax LLM! Stars Contributors LastCommit
  • MLX: MLX: An array framework for Apple silicon. Stars Contributors LastCommit

FineTune

  • Axolotl: Go ahead and axolotl questions. Stars Contributors LastCommit
  • EasyLM: Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Flax. Stars Contributors LastCommit
  • LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024). Stars Contributors LastCommit
  • LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. Stars Contributors LastCommit
  • maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL. Stars Contributors LastCommit
  • MLX-VLM: MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX. Stars Contributors LastCommit
  • Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...). Stars Contributors LastCommit
  • torchtune: PyTorch native post-training library. Stars Contributors LastCommit
  • Transformer Lab: Open Source Application for Advanced LLM Engineering: interact, train, fine-tune, and evaluate large language models on your own computer. Stars Contributors LastCommit
  • unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! πŸ¦₯ Stars Contributors LastCommit

Alignment

  • OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT). Stars Contributors LastCommit
  • Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback. Stars Contributors LastCommit

Evaluation

  • AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24). Stars Contributors LastCommit
  • LiveBench: LiveBench: A Challenging, Contamination-Free LLM Benchmark Stars Contributors LastCommit
  • lm-evaluation-harness: A framework for few-shot evaluation of language models. Stars Contributors LastCommit
  • LongBench: LongBench v2 and LongBench (ACL 2024). Stars Contributors LastCommit
  • MLE-bench: MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering Stars Contributors LastCommit
  • OpenCompass: OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets. Stars Contributors LastCommit
  • opik: Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards. Stars Contributors LastCommit
  • terminal-bench: A benchmark for LLMs on complicated tasks in the terminal Stars Contributors LastCommit

Workflow

  • Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Stars Contributors LastCommit
  • Kubeflow: Machine Learning Toolkit for Kubernetes. Stars Contributors LastCommit
  • Metaflow: Build, Deploy and Manage AI/ML Systems. Stars Contributors LastCommit
  • MLflow: Open source platform for the machine learning lifecycle. Stars Contributors LastCommit
  • Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle. Stars Contributors LastCommit
  • Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. Stars Contributors LastCommit
  • Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models. Stars Contributors LastCommit
  • ZenML: ZenML πŸ™: The bridge between ML and Ops. https://zenml.io. Stars Contributors LastCommit

About

πŸŽ‰ An awesome & curated list of best LLMOps tools.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Sponsor this project