You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+42-46Lines changed: 42 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,20 +11,18 @@
11
11
12
12
</div>
13
13
14
-
**Most AI inference tools are built around single-model APIs with rigid abstractions**. They lock you into serving one model per server, with no way to customize internals like batching, caching, or kernels. This makes it hard to build full systems like RAG or agents without stitching together multiple services. The result is complex MLOps orchestration, slower iteration, and bloated infrastructure.
14
+
Most tools serve one model with rigid abstractions. LitServe runs full AI systems - agents, chatbots, RAG, pipelines - with full control, custom logic, multi-model support, and zero YAML. Self host or deploy in one-click to [Lightning AI](https://lightning.ai/).
15
15
16
-
**LitServe flips this paradigm**: Write full AI pipelines, not just models, in clean, extensible Python. Built on FastAPI but optimized for AI workloads, LitServe supports multi-model serving, streaming, batching, and custom logic - all from a single server. Deploy in one click with autoscaling, monitoring, and zero infrastructure overhead. Or run it self-hosted with full control and no lock-in.
17
-
18
-
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
16
+
19
17
20
18
<divalign='center'>
21
19
22
20
<pre>
23
-
✅ (2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
24
-
✅ Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
Deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
101
+
102
+
```bash
103
+
# Deploy for free with autoscaling, monitoring, etc...
104
+
lightning deploy server.py --cloud
105
+
106
+
# Or run locally (self host anywhere)
107
+
lightning deploy server.py
108
+
# python server.py
109
+
```
110
+
111
+
Test the server: Simulate an http request (run this on any terminal):
112
+
```bash
113
+
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
114
+
```
115
+
102
116
### Agentic example
103
117
104
118
```python
@@ -127,39 +141,26 @@ if __name__ == "__main__":
127
141
server = ls.LitServer(NewsAgent())
128
142
server.run(port=8000)
129
143
```
130
-
131
-
Now deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
132
-
144
+
Test it:
133
145
```bash
134
-
# Deploy for free with autoscaling, monitoring, etc...
135
-
lightning deploy server.py --cloud
136
-
137
-
# Or run locally (self host anywhere)
138
-
lightning deploy server.py
139
-
# python server.py
146
+
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'
140
147
```
141
148
142
-
### Test the server
143
-
Simulate an http request (run this on any terminal):
144
-
```bash
145
-
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
146
-
```
149
+
147
150
148
-
### LLM serving
149
-
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
150
-
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).
151
+
# Key benefits
151
152
152
-
```
153
-
litgpt serve microsoft/phi-2
154
-
```
153
+
A few key benefits:
155
154
156
-
### Summary
157
-
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
158
-
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
155
+
-**Deploy any pipeline or model**: Agents, pipelines, RAG, chatbots, image models, video, speech, text, etc...
156
+
-**No MLOps glue:** LitAPI lets you build full AI systems (multi-model, agent, RAG) in one place ([more](https://lightning.ai/docs/litserve/api-reference/litapi)).
157
+
-**Instant setup:** Connect models, DBs, and data in a few lines with `setup()` ([more](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
158
+
-**Optimized:** autoscaling, GPU support, and fast inference included ([more](https://lightning.ai/docs/litserve/api-reference/litserver)).
159
+
-**Deploy anywhere:** self-host or one-click deploy with Lightning ([more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
160
+
-**FastAPI for AI/ML:** Built on FastAPI but optimized for AI - 2× faster with AI-specific multi-worker handling ([more]((#performance))).
161
+
-**Expert-friendly:** Use vLLM, or build your own with full control over batching, caching, and logic ([more](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api)).
161
162
162
-
[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
163
+
> ⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
163
164
164
165
165
166
@@ -185,24 +186,19 @@ Here are examples of inference pipelines for common model types and use cases.
185
186
186
187
187
188
189
+
# Host anywhere
188
190
189
-
# Hosting options
190
-
Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).
Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
195
-
196
-
*Note:* Lightning offers a generous free tier for developers.
191
+
Self-host with full control, or deploy with [Lightning AI](https://lightning.ai/) in seconds with autoscaling, security, and 99.995% uptime.
192
+
**Free tier included. No setup required. Run on your cloud**
197
193
198
-
To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.
0 commit comments