Skip to content

Commit 9f98e5a

Browse files
committed
refactor: Migration from httpx to aiohttp for improved concurrency
Replaced httpx with aiohttp for better asynchronous performance and resource utilization. Fixed JSON syntax error in error response handling.
1 parent 4aa1a8b commit 9f98e5a

File tree

4 files changed

+129
-19
lines changed

4 files changed

+129
-19
lines changed

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ dependencies = [
2626
"xxhash==3.5.0",
2727
"psutil==7.0.0",
2828
"pyyaml>=6.0.2",
29-
"httpx==0.28.1",
3029
]
3130

3231
[project.scripts]

src/vllm_router/requirements.txt

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
aiofiles==24.1.0
22
aiohttp==3.9.5
33
fastapi==0.115.8
4-
httpx==0.28.1
54
kubernetes==32.0.0
65
numpy==1.26.4
76
prometheus_client==0.21.1

src/vllm_router/services/request_service/request.py

Lines changed: 30 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
from typing import Optional
2121

2222
import aiohttp
23-
import httpx
2423
from fastapi import BackgroundTasks, HTTPException, Request, UploadFile
2524
from fastapi.responses import JSONResponse, StreamingResponse
2625
from requests import JSONDecodeError
@@ -627,14 +626,26 @@ async def route_general_transcriptions(
627626
logger.debug("==== data payload keys ====")
628627

629628
try:
630-
async with request.app.state.httpx_client_wrapper() as client:
631-
backend_response = await client.post(
632-
f"{chosen_url}{endpoint}", data=data, files=files, timeout=300.0
633-
)
634-
backend_response.raise_for_status()
629+
client = request.app.state.aiohttp_client_wrapper()
630+
631+
form_data = aiohttp.FormData()
632+
633+
# add file data
634+
for key, (filename, content, content_type) in files.items():
635+
form_data.add_field(key, content, filename=filename, content_type=content_type)
636+
637+
# add from data
638+
for key, value in data.items():
639+
form_data.add_field(key,value)
640+
641+
backend_response = await client.post(
642+
f"{chosen_url}{endpoint}",
643+
data=form_data,
644+
timeout=aiohttp.ClientTimeout(total=300)
645+
)
635646

636647
# --- 4. Return the response ---
637-
response_content = backend_response.json()
648+
response_content = await backend_response.json()
638649
headers = {
639650
k: v
640651
for k, v in backend_response.headers.items()
@@ -645,17 +656,19 @@ async def route_general_transcriptions(
645656

646657
return JSONResponse(
647658
content=response_content,
648-
status_code=backend_response.status_code,
659+
status_code=backend_response.status,
649660
headers=headers,
650661
)
651-
except httpx.HTTPStatusError as e:
652-
error_content = (
653-
e.response.json()
654-
if "json" in e.response.headers.get("content-type", "")
655-
else e.response.text
656-
)
657-
return JSONResponse(status_code=e.response.status_code, content=error_content)
658-
except httpx.RequestError as e:
662+
except aiohttp.ClientResponseError as e:
663+
if hasattr(e, "response") and e.response is not None:
664+
try:
665+
error_content = await e.response.json()
666+
except:
667+
error_content = await e.response.text()
668+
else:
669+
error_content = {"error": f"HTTP {e.status}: {e.message}"}
670+
return JSONResponse(status_code=e.status, content=error_content)
671+
except aiohttp.ClientError as e:
659672
return JSONResponse(
660-
status_code=503, content={"error": f"Failed to connect to backend: {e}"}
673+
status_code=503, content={"error": f"Failed to connect to backend: {str(e)}"}
661674
)
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Tutorial: Whisper Transcription API in vLLM Production Stack
2+
3+
## Overview
4+
5+
This tutorial introduces the newly added `/v1/audio/transcriptions` endpoint in the `vllm-router`, enabling users to transcribe `.wav` audio files using OpenAI’s `whisper-small` model.
6+
7+
## Prerequisites
8+
9+
* Access to a machine with a GPU (e.g. via [RunPod](https://runpod.io/))
10+
* Python 3.12 environment (recommended with `uv`)
11+
* `vllm` and `production-stack` cloned and installed
12+
* `vllm` installed with audio support:
13+
14+
```bash
15+
pip install vllm[audio]
16+
```
17+
18+
## 1. Serving the Whisper Model
19+
20+
Start a vLLM backend with the `whisper-small` model:
21+
22+
```bash
23+
vllm serve \
24+
--task transcription openai/whisper-small \
25+
--host 0.0.0.0 --port 8002
26+
```
27+
28+
## 2. Running the Router
29+
30+
Create and run a router connected to the Whisper backend:
31+
32+
```bash
33+
#!/bin/bash
34+
if [[ $# -ne 2 ]]; then
35+
echo "Usage: $0 <router_port> <backend_url>"
36+
exit 1
37+
fi
38+
39+
uv run python3 -m vllm_router.app \
40+
--host 0.0.0.0 --port "$1" \
41+
--service-discovery static \
42+
--static-backends "$2" \
43+
--static-models "openai/whisper-small" \
44+
--static-model-types "transcription" \
45+
--routing-logic roundrobin \
46+
--log-stats \
47+
--engine-stats-interval 10 \
48+
--request-stats-window 10
49+
```
50+
51+
Example usage:
52+
53+
```bash
54+
./run-router.sh 8000 http://localhost:8002
55+
```
56+
57+
## 3. Sending a Transcription Request
58+
59+
Use `curl` to send a `.wav` file to the transcription endpoint:
60+
61+
* You can test with any `.wav` audio file of your choice.
62+
63+
```bash
64+
curl -v http://localhost:8000/v1/audio/transcriptions \
65+
-F 'file=@/path/to/audio.wav;type=audio/wav' \
66+
-F 'model=openai/whisper-small' \
67+
-F 'response_format=json' \
68+
-F 'language=en'
69+
```
70+
71+
### Supported Parameters
72+
73+
| Parameter | Description |
74+
| ----------------- | ------------------------------------------------------ |
75+
| `file` | Path to a `.wav` audio file |
76+
| `model` | Whisper model to use (e.g., `openai/whisper-small`) |
77+
| `prompt` | *(Optional)* Text prompt to guide the transcription |
78+
| `response_format` | One of `json`, `text`, `srt`, `verbose_json`, or `vtt` |
79+
| `temperature` | *(Optional)* Sampling temperature as a float |
80+
| `language` | ISO 639-1 code (e.g., `en`, `fr`, `zh`) |
81+
82+
## 4. Sample Output
83+
84+
```json
85+
{
86+
"text": "Testing testing testing the whisper small model testing testing testing the audio transcription function testing testing testing the whisper small model"
87+
}
88+
```
89+
90+
## 5. Notes
91+
92+
* Router uses extended aiohttp timeouts to support long transcription jobs.
93+
* This implementation dynamically discovers valid transcription backends and routes requests accordingly.
94+
95+
## 6. Resources
96+
97+
* [PR #469 – Add Whisper Transcription API](https://github.com/vllm-project/production-stack/pull/469)
98+
* [OpenAI Whisper GitHub](https://github.com/openai/whisper)
99+
* [Blog: vLLM Whisper Transcription Walkthrough](https://davidgao7.github.io/posts/vllm-v1-whisper-transcription/)

0 commit comments

Comments
 (0)