Skip to content

Commit 0fee014

Browse files
Update README.md
1 parent b4fe625 commit 0fee014

File tree

1 file changed

+42
-46
lines changed

1 file changed

+42
-46
lines changed

README.md

Lines changed: 42 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,18 @@
1111
 
1212
</div>
1313

14-
**Most AI inference tools are built around single-model APIs with rigid abstractions**. They lock you into serving one model per server, with no way to customize internals like batching, caching, or kernels. This makes it hard to build full systems like RAG or agents without stitching together multiple services. The result is complex MLOps orchestration, slower iteration, and bloated infrastructure.
14+
Most tools serve one model with rigid abstractions. LitServe runs full AI systems - agents, chatbots, RAG, pipelines - with full control, custom logic, multi-model support, and zero YAML. Self host or deploy in one-click to [Lightning AI](https://lightning.ai/).
1515

16-
**LitServe flips this paradigm**: Write full AI pipelines, not just models, in clean, extensible Python. Built on FastAPI but optimized for AI workloads, LitServe supports multi-model serving, streaming, batching, and custom logic - all from a single server. Deploy in one click with autoscaling, monitoring, and zero infrastructure overhead. Or run it self-hosted with full control and no lock-in.
17-
18-
LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.
16+
&nbsp;
1917

2018
<div align='center'>
2119

2220
<pre>
23-
(2x)+ faster serving ✅ Easy to use ✅ LLMs, non LLMs and more
24-
Bring your own model ✅ PyTorch/JAX/TF/... ✅ Built on FastAPI
25-
✅ GPU autoscaling ✅ Batching, Streaming ✅ Self-host or ⚡️ managed
26-
Inference pipeline ✅ Integrate with vLLM, etc ✅ Serverless
27-
21+
Build full AI systems ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
22+
Custom logic + controlAny PyTorch model ✅ Self-host or managed
23+
✅ GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
24+
No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
25+
2826
</pre>
2927

3028
<div align='center'>
@@ -43,7 +41,7 @@ LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-spec
4341
<a target="_blank" href="#featured-examples" style="margin: 0 10px;">Examples</a> •
4442
<a target="_blank" href="#features" style="margin: 0 10px;">Features</a> •
4543
<a target="_blank" href="#performance" style="margin: 0 10px;">Performance</a> •
46-
<a target="_blank" href="#hosting-options" style="margin: 0 10px;">Hosting</a> •
44+
<a target="_blank" href="#host-anywhere" style="margin: 0 10px;">Hosting</a> •
4745
<a target="_blank" href="https://lightning.ai/docs/litserve" style="margin: 0 10px;">Docs</a>
4846
</div>
4947
</div>
@@ -99,6 +97,22 @@ if __name__ == "__main__":
9997
server.run(port=8000)
10098
```
10199

100+
Deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
101+
102+
```bash
103+
# Deploy for free with autoscaling, monitoring, etc...
104+
lightning deploy server.py --cloud
105+
106+
# Or run locally (self host anywhere)
107+
lightning deploy server.py
108+
# python server.py
109+
```
110+
111+
Test the server: Simulate an http request (run this on any terminal):
112+
```bash
113+
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
114+
```
115+
102116
### Agentic example
103117

104118
```python
@@ -127,39 +141,26 @@ if __name__ == "__main__":
127141
server = ls.LitServer(NewsAgent())
128142
server.run(port=8000)
129143
```
130-
131-
Now deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
132-
144+
Test it:
133145
```bash
134-
# Deploy for free with autoscaling, monitoring, etc...
135-
lightning deploy server.py --cloud
136-
137-
# Or run locally (self host anywhere)
138-
lightning deploy server.py
139-
# python server.py
146+
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'
140147
```
141148

142-
### Test the server
143-
Simulate an http request (run this on any terminal):
144-
```bash
145-
curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
146-
```
149+
&nbsp;
147150

148-
### LLM serving
149-
LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).
150-
For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe).
151+
# Key benefits
151152

152-
```
153-
litgpt serve microsoft/phi-2
154-
```
153+
A few key benefits:
155154

156-
### Summary
157-
- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
158-
- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
159-
- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
160-
- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
155+
- **Deploy any pipeline or model**: Agents, pipelines, RAG, chatbots, image models, video, speech, text, etc...
156+
- **No MLOps glue:** LitAPI lets you build full AI systems (multi-model, agent, RAG) in one place ([more](https://lightning.ai/docs/litserve/api-reference/litapi)).
157+
- **Instant setup:** Connect models, DBs, and data in a few lines with `setup()` ([more](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).
158+
- **Optimized:** autoscaling, GPU support, and fast inference included ([more](https://lightning.ai/docs/litserve/api-reference/litserver)).
159+
- **Deploy anywhere:** self-host or one-click deploy with Lightning ([more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
160+
- **FastAPI for AI/ML:** Built on FastAPI but optimized for AI - 2× faster with AI-specific multi-worker handling ([more]((#performance))).
161+
- **Expert-friendly:** Use vLLM, or build your own with full control over batching, caching, and logic ([more](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api)).
161162

162-
[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).
163+
> ⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
163164
164165
&nbsp;
165166

@@ -185,24 +186,19 @@ Here are examples of inference pipelines for common model types and use cases.
185186

186187
&nbsp;
187188

189+
# Host anywhere
188190

189-
# Hosting options
190-
Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).
191-
192-
https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344
193-
194-
Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
195-
196-
*Note:* Lightning offers a generous free tier for developers.
191+
Self-host with full control, or deploy with [Lightning AI](https://lightning.ai/) in seconds with autoscaling, security, and 99.995% uptime.
192+
**Free tier included. No setup required. Run on your cloud**
197193

198-
To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.
199194
```bash
200195
lightning deploy server.py --cloud
201196
```
197+
[learn more](https://lightning.ai/)
202198

203199
&nbsp;
204200

205-
## Features
201+
# Features
206202

207203
<div align='center'>
208204

0 commit comments

Comments
 (0)