Skip to content

Commit c1e5984

Browse files
authored
Merge branch 'main' into chore/extend-ci-matrix
2 parents 37fd47a + 6a11dc5 commit c1e5984

File tree

8 files changed

+194
-29
lines changed

8 files changed

+194
-29
lines changed

README.md

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
<div align='center'>
22

3-
<h2>
4-
The framework to build custom inference engines with expert control.
3+
<h1>
4+
Build custom inference servers in pure Python
55
<br/>
6-
Engines for models, agents, MCP, multi-modal, RAG, and pipelines.
6+
</h1>
7+
<h4>
8+
Define exactly how inference works for models, agents, RAG, or pipelines.
79
<br/>
8-
No MLOps. No YAML.
9-
</h2>
10+
Control batching, routing, streaming, and orchestration without MLOps glue or config files.
11+
</h4>
1012

1113
<img alt="Lightning" src="https://pl-bolts-doc-images.s3.us-east-2.amazonaws.com/app-2/ls_banner2.png" width="800px" style="max-width: 100%;">
1214

@@ -16,10 +18,10 @@
1618
<div align='center'>
1719

1820
<pre>
19-
Build your own inference engine ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
20-
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
21-
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
22-
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
21+
Custom inference logic ✅ 2× faster than FastAPI ✅ Agents, RAG, pipelines, more
22+
✅ Custom logic + control ✅ Any PyTorch model ✅ Self-host or managed
23+
✅ Multi-GPU autoscaling ✅ Batching + streaming ✅ BYO model or vLLM
24+
✅ No MLOps glue code ✅ Easy setup in Python ✅ Serverless support
2325

2426
</pre>
2527

@@ -54,22 +56,16 @@
5456

5557
&nbsp;
5658

57-
# Looking for GPUs and an inference platform?
58-
Over 340,000 developers use [Lightning Cloud](https://lightning.ai/?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) - purpose-built for PyTorch and PyTorch Lightning.
59-
- [GPUs](https://lightning.ai/pricing?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme) from $0.19.
60-
- [Clusters](https://lightning.ai/clusters?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): frontier-grade training/inference clusters.
61-
- [AI Studio (vibe train)](https://lightning.ai/studios?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): workspaces where AI helps you debug, tune and vibe train.
62-
- [AI Studio (vibe deploy)](https://lightning.ai/studios?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): workspaces where AI helps you optimize, and deploy models.
63-
- [Notebooks](https://lightning.ai/notebooks?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): Persistent GPU workspaces where AI helps you code and analyze.
64-
- [Inference](https://lightning.ai/deploy?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme): Deploy models as inference APIs.
65-
6659
# Why LitServe?
67-
LitServe lets you build your own inference engine. Serving engines such as vLLM serve specific model types (LLMs) with rigid abstractions. LitServe gives you the low-level control to serve any model (vision, audio, text, multi-modal), and define exactly how inference works - from batching, caching, streaming, and routing, to multi-model orchestration and custom logic. LitServe is perfect for building inference APIs, agents, chatbots, MCP servers, RAG, pipelines and more.
60+
Most serving tools (vLLM, etc..) are built for a single model type and enforce rigid abstractions. They work well until you need custom logic, multiple models, agents, or non standard pipelines. LitServe lets you write your own inference engine in Python. You define how requests are handled, how models are loaded, how batching and routing work, and how outputs are produced. LitServe handles performance, concurrency, scaling, and deployment. Use LitServe to build inference APIs, agents, chatbots, RAG systems, MCP servers, or multi model pipelines.
6861

69-
Self host LitServe or deploy in one-click to [Lightning AI](https://lightning.ai/litserve?utm_source=litserve_readme&utm_medium=referral&utm_campaign=litserve_readme).
62+
Run it locally, self host anywhere, or deploy with one click on [Lightning AI](https://lightning.ai/litserve?utm_source=litserve_readme&utm_medium=referral&utm_campaign=litserve_readme).
7063

7164
&nbsp;
7265

66+
# Want the easiest way to host inference?
67+
Over 380,000 developers use [Lightning Cloud](https://lightning.ai/?utm_source=ptl_readme&utm_medium=referral&utm_campaign=ptl_readme), the simplest way to run LitServe without managing infrastructure. Deploy with one command, get autoscaling GPUs, monitoring, and a free tier. No cloud setup required. Or self host anywhere.
68+
7369
# Quick start
7470

7571
Install LitServe via pip ([more options](https://lightning.ai/docs/litserve/home/install)):

src/litserve/cli.py

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,33 @@
1+
import importlib.util
2+
import shutil
13
import subprocess
24
import sys
35

46
from litserve.utils import is_package_installed
57

68

79
def _ensure_lightning_installed():
8-
if not is_package_installed("lightning_sdk"):
9-
print("Lightning CLI not found. Installing...")
10-
subprocess.check_call([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"])
10+
"""Ensure lightning-sdk is installed, attempting auto-installation if needed."""
11+
if is_package_installed("lightning_sdk"):
12+
return
13+
14+
print("Lightning CLI not found. Installing lightning-sdk...")
15+
16+
# Build list of available installers (pip first as it respects the active environment)
17+
installers = []
18+
if importlib.util.find_spec("pip"):
19+
installers.append([sys.executable, "-m", "pip"])
20+
if shutil.which("uv"):
21+
installers.append(["uv", "pip"])
22+
23+
for installer in installers:
24+
try:
25+
subprocess.run([*installer, "install", "-U", "lightning-sdk"], check=True)
26+
return
27+
except (subprocess.CalledProcessError, FileNotFoundError):
28+
continue
29+
30+
sys.exit("Failed to install lightning-sdk. Run: pip install lightning-sdk")
1131

1232

1333
def main():

src/litserve/server.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1071,6 +1071,8 @@ def _register_spec_endpoints(self, lit_api: LitAPI):
10711071
specs = [lit_api.spec] if lit_api.spec else []
10721072
for spec in specs:
10731073
spec: LitSpec
1074+
# Set the server reference for callback triggering in spec endpoints
1075+
spec._server = self
10741076
# TODO check that path is not clashing
10751077
for path, endpoint, methods in spec.endpoints:
10761078
self.app.add_api_route(

src/litserve/specs/base.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@ def __init__(self):
3131
self.request_queue = None
3232
self.response_queue_id = None
3333

34+
def __getstate__(self):
35+
"""Exclude _server from pickling as it contains unpickleable objects."""
36+
state = self.__dict__.copy()
37+
state["_server"] = None
38+
return state
39+
3440
@property
3541
def stream(self):
3642
return False

src/litserve/specs/openai.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
from fastapi.responses import StreamingResponse
2929
from pydantic import BaseModel, Field
3030

31+
from litserve.callbacks.base import EventTypes
3132
from litserve.constants import _DEFAULT_LIT_API_PATH
3233
from litserve.specs.base import LitSpec, _AsyncSpecWrapper
3334
from litserve.utils import LitAPIStatus, ResponseBufferItem, azip
@@ -502,6 +503,14 @@ async def chat_completion(self, request: ChatCompletionRequest, background_tasks
502503
uids = [uuid.uuid4() for _ in range(request.n)]
503504
self.queues = []
504505
self.events = []
506+
507+
# Trigger callback
508+
self._server._callback_runner.trigger_event(
509+
EventTypes.ON_REQUEST.value,
510+
active_requests=self._server.active_requests,
511+
litserver=self._server,
512+
)
513+
505514
for uid in uids:
506515
request_el = request.model_copy()
507516
request_el.n = 1

src/litserve/specs/openai_embedding.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from fastapi import status as status_code
2525
from pydantic import BaseModel
2626

27+
from litserve.callbacks.base import EventTypes
2728
from litserve.constants import _DEFAULT_LIT_API_PATH
2829
from litserve.specs.base import LitSpec
2930
from litserve.utils import LitAPIStatus, ResponseBufferItem
@@ -261,6 +262,13 @@ async def embeddings_endpoint(self, request: EmbeddingRequest) -> EmbeddingRespo
261262
event = asyncio.Event()
262263
self.response_buffer[uid] = ResponseBufferItem(event=event)
263264

265+
# Trigger callback
266+
self._server._callback_runner.trigger_event(
267+
EventTypes.ON_REQUEST.value,
268+
active_requests=self._server.active_requests,
269+
litserver=self._server,
270+
)
271+
264272
self.request_queue.put_nowait((response_queue_id, uid, time.monotonic(), request.model_copy()))
265273
await event.wait()
266274

tests/unit/test_callbacks.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,45 @@ async def test_request_tracker(capfd):
8080
await run_simple_request(server, 4)
8181
captured = capfd.readouterr()
8282
assert "Active requests: 4" in captured.out, f"Expected pattern not found in output: {captured.out}"
83+
84+
85+
@pytest.mark.asyncio
86+
async def test_request_tracker_with_spec(capfd):
87+
from litserve.specs.openai_embedding import OpenAIEmbeddingSpec
88+
from litserve.test_examples.openai_embedding_spec_example import TestEmbedAPI
89+
90+
lit_api = TestEmbedAPI(spec=OpenAIEmbeddingSpec())
91+
server = ls.LitServer(lit_api, track_requests=True, callbacks=[RequestTracker()])
92+
93+
with wrap_litserve_start(server) as server:
94+
async with (
95+
LifespanManager(server.app) as manager,
96+
AsyncClient(transport=ASGITransport(app=manager.app), base_url="http://test") as ac,
97+
):
98+
resp = await ac.post("/v1/embeddings", json={"input": "test", "model": "test"})
99+
assert resp.status_code == 200
100+
101+
captured = capfd.readouterr()
102+
assert "Active requests: 1" in captured.out, f"Expected pattern not found in output: {captured.out}"
103+
104+
105+
@pytest.mark.asyncio
106+
async def test_request_tracker_with_openai_spec(capfd):
107+
from litserve.specs.openai import OpenAISpec
108+
from litserve.test_examples.openai_spec_example import TestAPI
109+
110+
lit_api = TestAPI(spec=OpenAISpec())
111+
server = ls.LitServer(lit_api, track_requests=True, callbacks=[RequestTracker()])
112+
113+
with wrap_litserve_start(server) as server:
114+
async with (
115+
LifespanManager(server.app) as manager,
116+
AsyncClient(transport=ASGITransport(app=manager.app), base_url="http://test") as ac,
117+
):
118+
resp = await ac.post(
119+
"/v1/chat/completions", json={"messages": [{"role": "user", "content": "test"}], "model": "test"}
120+
)
121+
assert resp.status_code == 200
122+
123+
captured = capfd.readouterr()
124+
assert "Active requests: 1" in captured.out, f"Expected pattern not found in output: {captured.out}"

tests/unit/test_cli.py

Lines changed: 88 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import os
2+
import subprocess
23
import sys
34
from unittest.mock import MagicMock, patch
45

@@ -34,19 +35,98 @@ def test_dockerize_command(monkeypatch, capsys):
3435

3536

3637
@patch("litserve.cli.is_package_installed")
37-
@patch("subprocess.check_call")
38-
def test_ensure_lightning_installed(mock_check_call, mock_is_package_installed):
38+
@patch("litserve.cli.importlib.util.find_spec")
39+
@patch("litserve.cli.shutil.which")
40+
@patch("subprocess.run")
41+
def test_ensure_lightning_installed_with_pip(mock_run, mock_which, mock_find_spec, mock_is_package_installed):
3942
mock_is_package_installed.return_value = False
43+
mock_find_spec.return_value = True # pip available
44+
mock_which.return_value = None # uv not available
4045
_ensure_lightning_installed()
41-
mock_check_call.assert_called_once_with([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"])
46+
mock_run.assert_called_once_with([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"], check=True)
47+
48+
49+
@patch("litserve.cli.is_package_installed")
50+
@patch("litserve.cli.importlib.util.find_spec")
51+
@patch("litserve.cli.shutil.which")
52+
@patch("subprocess.run")
53+
def test_ensure_lightning_installed_pip_preferred(mock_run, mock_which, mock_find_spec, mock_is_package_installed):
54+
"""When both pip and uv are available, pip should be used first."""
55+
mock_is_package_installed.return_value = False
56+
mock_find_spec.return_value = True # pip available
57+
mock_which.return_value = "/usr/bin/uv" # uv also available
58+
_ensure_lightning_installed()
59+
mock_run.assert_called_once_with([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"], check=True)
60+
61+
62+
@patch("litserve.cli.is_package_installed")
63+
@patch("litserve.cli.importlib.util.find_spec")
64+
@patch("litserve.cli.shutil.which")
65+
@patch("subprocess.run")
66+
def test_ensure_lightning_installed_with_uv(mock_run, mock_which, mock_find_spec, mock_is_package_installed):
67+
mock_is_package_installed.return_value = False
68+
mock_find_spec.return_value = None # pip not available
69+
mock_which.return_value = "/usr/bin/uv" # uv available
70+
_ensure_lightning_installed()
71+
mock_run.assert_called_once_with(["uv", "pip", "install", "-U", "lightning-sdk"], check=True)
72+
73+
74+
@patch("litserve.cli.is_package_installed")
75+
@patch("litserve.cli.importlib.util.find_spec")
76+
@patch("litserve.cli.shutil.which")
77+
@patch("subprocess.run")
78+
def test_ensure_lightning_installed_fallback_to_uv(mock_run, mock_which, mock_find_spec, mock_is_package_installed):
79+
"""When pip fails, should fall back to uv."""
80+
mock_is_package_installed.return_value = False
81+
mock_find_spec.return_value = True # pip available
82+
mock_which.return_value = "/usr/bin/uv" # uv also available
83+
mock_run.side_effect = [subprocess.CalledProcessError(1, "pip"), None] # pip fails, uv succeeds
84+
_ensure_lightning_installed()
85+
assert mock_run.call_count == 2
86+
mock_run.assert_called_with(["uv", "pip", "install", "-U", "lightning-sdk"], check=True)
87+
88+
89+
@patch("litserve.cli.is_package_installed")
90+
@patch("litserve.cli.importlib.util.find_spec")
91+
@patch("litserve.cli.shutil.which")
92+
@patch("subprocess.run")
93+
def test_ensure_lightning_installed_failure(mock_run, mock_which, mock_find_spec, mock_is_package_installed):
94+
"""When all available installers fail, should exit with error."""
95+
mock_is_package_installed.return_value = False
96+
mock_find_spec.return_value = True # pip available
97+
mock_which.return_value = "/usr/bin/uv" # uv also available
98+
mock_run.side_effect = subprocess.CalledProcessError(1, "install") # both fail
99+
100+
with pytest.raises(SystemExit, match="Failed to install lightning-sdk"):
101+
_ensure_lightning_installed()
102+
assert mock_run.call_count == 2 # tried both pip and uv
103+
104+
105+
@patch("litserve.cli.is_package_installed")
106+
@patch("litserve.cli.importlib.util.find_spec")
107+
@patch("litserve.cli.shutil.which")
108+
@patch("subprocess.run")
109+
def test_ensure_lightning_installed_no_installer_available(
110+
mock_run, mock_which, mock_find_spec, mock_is_package_installed
111+
):
112+
"""When neither pip nor uv is available, should exit with error."""
113+
mock_is_package_installed.return_value = False
114+
mock_find_spec.return_value = None # pip not available
115+
mock_which.return_value = None # uv not available
116+
117+
with pytest.raises(SystemExit, match="Failed to install lightning-sdk"):
118+
_ensure_lightning_installed()
119+
mock_run.assert_not_called() # no installer was tried
42120

43121

44122
# TODO: Remove this once we have a fix for Python 3.10
45123
@pytest.mark.skipif(sys.version_info[:2] in [(3, 10)], reason="Test fails on Python 3.10")
46124
@patch("litserve.cli.is_package_installed")
47-
@patch("subprocess.check_call")
125+
@patch("litserve.cli.importlib.util.find_spec")
126+
@patch("litserve.cli.shutil.which")
127+
@patch("subprocess.run")
48128
@patch("builtins.__import__")
49-
def test_cli_main_lightning_not_installed(mock_import, mock_check_call, mock_is_package_installed):
129+
def test_cli_main_lightning_not_installed(mock_import, mock_run, mock_which, mock_find_spec, mock_is_package_installed):
50130
# Create a mock for the lightning_sdk module and its components
51131
mock_lightning_sdk = MagicMock()
52132
mock_lightning_sdk.cli.entrypoint.main_cli = MagicMock()
@@ -58,6 +138,8 @@ def side_effect(name, *args, **kwargs):
58138
return __import__(name, *args, **kwargs)
59139

60140
mock_import.side_effect = side_effect
141+
mock_find_spec.return_value = True # pip available
142+
mock_which.return_value = None # uv not available
61143

62144
# Test when lightning_sdk is not installed but gets installed dynamically
63145
mock_is_package_installed.side_effect = [False, True] # First call returns False, second call returns True
@@ -66,7 +148,7 @@ def side_effect(name, *args, **kwargs):
66148
with patch.object(sys, "argv", test_args):
67149
cli_main()
68150

69-
mock_check_call.assert_called_once_with([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"])
151+
mock_run.assert_called_once_with([sys.executable, "-m", "pip", "install", "-U", "lightning-sdk"], check=True)
70152

71153

72154
@pytest.mark.skipif(sys.version_info[:2] in [(3, 10)], reason="Test fails on Python 3.10")

0 commit comments

Comments
 (0)