You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✅ Build your own inference engine ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
20
-
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
21
-
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
22
-
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
21
+
✅ Custom inference logic ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
22
+
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
23
+
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
24
+
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
23
25
24
26
</pre>
25
27
@@ -54,22 +56,16 @@
54
56
55
57
56
58
57
-
# Looking for GPUs and an inference platform?
58
-
Over 340,000 developers use [Lightning Cloud](https://lightning.ai/?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) - purpose-built for PyTorch and PyTorch Lightning.
59
-
-[GPUs](https://lightning.ai/pricing?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) from $0.19.
-[AI Studio (vibe train)](https://lightning.ai/studios?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): workspaces where AI helps you debug, tune and vibe train.
62
-
-[AI Studio (vibe deploy)](https://lightning.ai/studios?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): workspaces where AI helps you optimize, and deploy models.
63
-
-[Notebooks](https://lightning.ai/notebooks?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): Persistent GPU workspaces where AI helps you code and analyze.
64
-
-[Inference](https://lightning.ai/deploy?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): Deploy models as inference APIs.
65
-
66
59
# Why LitServe?
67
-
LitServe lets you build your own inference engine. Serving engines such as vLLM serve specific model types (LLMs) with rigid abstractions. LitServe gives you the low-level control to serve any model (vision, audio, text, multi-modal), and define exactly how inference works - from batching, caching, streaming, and routing, to multi-model orchestration and custom logic. LitServe is perfect for building inference APIs, agents, chatbots, MCP servers, RAG, pipelines and more.
60
+
Most serving tools (vLLM, etc..) are built for a single model type and enforce rigid abstractions. They work well until you need custom logic, multiple models, agents, or non standard pipelines. LitServe lets you write your own inference engine in Python. You define how requests are handled, how models are loaded, how batching and routing work, and how outputs are produced. LitServe handles performance, concurrency, scaling, and deployment. Use LitServe to build inference APIs, agents, chatbots, RAG systems, MCP servers, or multi model pipelines.
68
61
69
-
Self host LitServe or deploy in one-click to[Lightning AI](https://lightning.ai/litserve?utm_source=litserve_readme&utm_medium=referral&utm_campaign=litserve_readme).
62
+
Run it locally, self host anywhere, or deploy with oneclick on[Lightning AI](https://lightning.ai/litserve?utm_source=litserve_readme&utm_medium=referral&utm_campaign=litserve_readme).
70
63
71
64
72
65
66
+
# Want the easiest way to host inference?
67
+
Over 380,000 developers use [Lightning Cloud](https://lightning.ai/?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme), the simplest way to run LitServe without managing infrastructure. Deploy with one command, get autoscaling GPUs, monitoring, and a free tier. No cloud setup required. Or self host anywhere.
68
+
73
69
# Quick start
74
70
75
71
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):
0 commit comments