Skip to content

Commit ae2b85c

Browse files
authored
docs: Add documentation for OpenAI-compatible API in LitGPT deployment (#2082)
1 parent d4107ea commit ae2b85c

File tree

2 files changed

+57
-1
lines changed

2 files changed

+57
-1
lines changed

litgpt/deploy/serve.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,9 @@ def run_server(
233233
The "auto" setting (default) chooses a GPU if available, and otherwise uses a CPU.
234234
port: The network port number on which the model is configured to be served.
235235
stream: Whether to stream the responses.
236-
openai_spec: Whether to use the OpenAISpec.
236+
openai_spec: Whether to use the OpenAISpec and enable OpenAI-compatible API endpoints. When True, the server will provide
237+
`/v1/chat/completions` endpoints that work with the OpenAI SDK and other OpenAI-compatible clients,
238+
making it easy to integrate with existing applications that use the OpenAI API.
237239
access_token: Optional API token to access models with restrictions.
238240
"""
239241
checkpoint_dir = auto_download_checkpoint(model_name=checkpoint_dir, access_token=access_token)

tutorials/deploy.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,60 @@ Sure, here is the corrected sentence:
8080
Example input
8181
```
8282

83+
 
84+
## Serve an LLM with OpenAI-compatible API
85+
86+
LitGPT provides OpenAI-compatible endpoints that allow you to use the OpenAI SDK or any OpenAI-compatible client to interact with your models. This is useful for integrating LitGPT into existing applications that use the OpenAI API.
87+
88+
 
89+
### Step 1: Start the server with OpenAI specification
90+
91+
```bash
92+
# 1) Download a pretrained model (alternatively, use your own finetuned model)
93+
litgpt download HuggingFaceTB/SmolLM2-135M-Instruct
94+
95+
# 2) Start the server with OpenAI-compatible endpoints
96+
litgpt serve HuggingFaceTB/SmolLM2-135M-Instruct --openai_spec true
97+
```
98+
99+
> [!TIP]
100+
> The `--openai_spec true` flag enables OpenAI-compatible endpoints at `/v1/chat/completions` instead of the default `/predict` endpoint.
101+
102+
 
103+
### Step 2: Query using OpenAI-compatible endpoints
104+
105+
You can now send requests to the OpenAI-compatible endpoint using curl:
106+
107+
```bash
108+
curl -X POST http://127.0.0.1:8000/v1/chat/completions \
109+
-H "Content-Type: application/json" \
110+
-d '{
111+
"model": "SmolLM2-135M-Instruct",
112+
"messages": [{"role": "user", "content": "Hello! How are you?"}]
113+
}'
114+
```
115+
116+
Or use the OpenAI Python SDK:
117+
118+
```python
119+
from openai import OpenAI
120+
121+
# Configure the client to use your local LitGPT server
122+
client = OpenAI(
123+
base_url="http://127.0.0.1:8000/v1",
124+
api_key="not-needed" # LitGPT doesn't require authentication by default
125+
)
126+
127+
response = client.chat.completions.create(
128+
model="SmolLM2-135M-Instruct",
129+
messages=[
130+
{"role": "user", "content": "Hello! How are you?"}
131+
]
132+
)
133+
134+
print(response.choices[0].message.content)
135+
```
136+
83137
 
84138
## Serve an LLM UI with Chainlit
85139

0 commit comments

Comments
 (0)