Update README.md

williamFalcon · web-flow · commit 0fee014c466b · 2025-05-25T05:22:01.000-07:00
diff --git a/README.md b/README.md
@@ -11,20 +11,18 @@
 &nbsp; 
 </div>
 
-**Most AI inference tools are built around single-model APIs with rigid abstractions**. They lock you into serving one model per server, with no way to customize internals like batching, caching, or kernels. This makes it hard to build full systems like RAG or agents without stitching together multiple services. The result is complex MLOps orchestration, slower iteration, and bloated infrastructure.   
+Most tools serve one model with rigid abstractions. LitServe runs full AI systems - agents, chatbots, RAG, pipelines - with full control, custom logic, multi-model support, and zero YAML. Self host or deploy in one-click to [Lightning AI](https://lightning.ai/).
 
-**LitServe flips this paradigm**: Write full AI pipelines, not just models, in clean, extensible Python. Built on FastAPI but optimized for AI workloads, LitServe supports multi-model serving, streaming, batching, and custom logic - all from a single server. Deploy in one click with autoscaling, monitoring, and zero infrastructure overhead. Or run it self-hosted with full control and no lock-in.
-
-LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-specific multi-worker handling.    
+&nbsp;
 
 <div align='center'>
   
 <pre>
-✅ (2x)+ faster serving  ✅ Easy to use               ✅ LLMs, non LLMs and more
-✅ Bring your own model  ✅ PyTorch/JAX/TF/...        ✅ Built on FastAPI       
-✅ GPU autoscaling       ✅ Batching, Streaming       ✅ Self-host or ⚡️ managed
-✅ Inference pipeline    ✅ Integrate with vLLM, etc  ✅ Serverless             
-   
+✅ Build full AI systems   ✅ 2× faster than FastAPI     ✅ Agents, RAG, pipelines, more
+✅ Custom logic + control  ✅ Any PyTorch model          ✅ Self-host or managed        
+✅ GPU autoscaling         ✅ Batching + streaming       ✅ BYO model or vLLM           
+✅ No MLOps glue code      ✅ Easy setup in Python       ✅ Serverless support          
+
 </pre>
 
 <div align='center'>
@@ -43,7 +41,7 @@ LitServe is at least [2x faster](#performance) than plain FastAPI due to AI-spec
     <a target="_blank" href="#featured-examples" style="margin: 0 10px;">Examples</a> •
     <a target="_blank" href="#features" style="margin: 0 10px;">Features</a> •
     <a target="_blank" href="#performance" style="margin: 0 10px;">Performance</a> •
-    <a target="_blank" href="#hosting-options" style="margin: 0 10px;">Hosting</a> •
+    <a target="_blank" href="#host-anywhere" style="margin: 0 10px;">Hosting</a> •
     <a target="_blank" href="https://lightning.ai/docs/litserve" style="margin: 0 10px;">Docs</a>
   </div>
 </div>
@@ -99,6 +97,22 @@ if __name__ == "__main__":
     server.run(port=8000)
 ```
 
+Deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
+
+```bash
+# Deploy for free with autoscaling, monitoring, etc...
+lightning deploy server.py --cloud
+
+# Or run locally (self host anywhere)
+lightning deploy server.py
+# python server.py
+```
+
+Test the server: Simulate an http request (run this on any terminal):
+```bash
+curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
+```
+
 ### Agentic example
 
 ```python
@@ -127,39 +141,26 @@ if __name__ == "__main__":
     server = ls.LitServer(NewsAgent())
     server.run(port=8000)
 ```
-
-Now deploy for free to [Lightning cloud](#hosting-options) (or self host anywhere):
-
+Test it:
 ```bash
-# Deploy for free with autoscaling, monitoring, etc...
-lightning deploy server.py --cloud
-
-# Or run locally (self host anywhere)
-lightning deploy server.py
-# python server.py
+curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"website_url": "https://text.npr.org/"}'
 ```
 
-### Test the server
-Simulate an http request (run this on any terminal):
-```bash
-curl -X POST http://127.0.0.1:8000/predict -H "Content-Type: application/json" -d '{"input": 4.0}'
-```
+&nbsp;
 
-### LLM serving
-LitServe isn’t *just* for LLMs like vLLM or Ollama; it serves any AI model with full control over internals ([learn more](https://lightning.ai/docs/litserve/features/serve-llms)).    
-For easy LLM serving, integrate [vLLM with LitServe](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api), or use [LitGPT](https://github.com/Lightning-AI/litgpt?tab=readme-ov-file#deploy-an-llm) (built on LitServe). 
+# Key benefits   
 
-```
-litgpt serve microsoft/phi-2
-```
+A few key benefits:
 
-### Summary
-- LitAPI lets you easily build complex AI systems with one or more models ([docs](https://lightning.ai/docs/litserve/api-reference/litapi)).
-- Use the setup method for one-time tasks like connecting models, DBs, and loading data ([docs](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).        
-- LitServer handles optimizations like batching, GPU autoscaling, streaming, etc... ([docs](https://lightning.ai/docs/litserve/api-reference/litserver)).
-- Self host on your machines or create a fully managed deployment with Lightning ([learn more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
+- **Deploy any pipeline or model**: Agents, pipelines, RAG, chatbots, image models, video, speech, text, etc...
+- **No MLOps glue:** LitAPI lets you build full AI systems (multi-model, agent, RAG) in one place ([more](https://lightning.ai/docs/litserve/api-reference/litapi)).   
+- **Instant setup:** Connect models, DBs, and data in a few lines with `setup()` ([more](https://lightning.ai/docs/litserve/api-reference/litapi#setup)).    
+- **Optimized:** autoscaling, GPU support, and fast inference included ([more](https://lightning.ai/docs/litserve/api-reference/litserver)).    
+- **Deploy anywhere:** self-host or one-click deploy with Lightning ([more](https://lightning.ai/docs/litserve/features/deploy-on-cloud)).
+- **FastAPI for AI/ML:** Built on FastAPI but optimized for AI - 2× faster with AI-specific multi-worker handling ([more]((#performance))).   
+- **Expert-friendly:** Use vLLM, or build your own with full control over batching, caching, and logic ([more](https://lightning.ai/lightning-ai/studios/deploy-a-private-llama-3-2-rag-api)).    
 
-[Learn how to make this server 200x faster](https://lightning.ai/docs/litserve/home/speed-up-serving-by-200x).    
+> ⚠️ Not a vLLM or Ollama alternative out of the box. LitServe gives you lower-level flexibility to build what they do (and more) if you need it.
 
 &nbsp;
 
@@ -185,24 +186,19 @@ Here are examples of inference pipelines for common model types and use cases.
 
 &nbsp;
 
+# Host anywhere
 
-# Hosting options   
-Self host LitServe anywhere or deploy to your favorite cloud via [Lightning AI](http://lightning.ai/deploy).
-
-https://github.com/user-attachments/assets/ff83dab9-0c9f-4453-8dcb-fb9526726344
-
-Self-hosting is ideal for hackers, students, and DIY developers while fully managed hosting is ideal for enterprise developers needing easy autoscaling, security, release management, and 99.995% uptime and observability.
-
-*Note:* Lightning offers a generous free tier for developers.
+Self-host with full control, or deploy with [Lightning AI](https://lightning.ai/) in seconds with autoscaling, security, and 99.995% uptime.  
+**Free tier included. No setup required. Run on your cloud**   
 
-To host on [Lightning AI](https://lightning.ai/deploy), simply run the command, login and choose the cloud of your choice.
 ```bash
 lightning deploy server.py --cloud
 ```
+[learn more](https://lightning.ai/)    
 
 &nbsp;
 
-## Features
+# Features
 
 <div align='center'>