Skip to content

Commit ee06c34

Browse files
committed
Merge remote-tracking branch 'upstream/release'
2 parents fa12712 + 65c7b70 commit ee06c34

23 files changed

+953
-34
lines changed

.buildkite/test-pipeline.yaml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ steps:
113113
- pytest -v -s entrypoints/llm/test_generate.py # it needs a clean process
114114
- pytest -v -s entrypoints/llm/test_generate_multiple_loras.py # it needs a clean process
115115
- pytest -v -s entrypoints/llm/test_guided_generate.py # it needs a clean process
116-
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_oot_registration.py
116+
- pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_oot_registration.py --ignore=entrypoints/openai/correctness/
117117
- pytest -v -s entrypoints/test_chat_utils.py
118118
- pytest -v -s entrypoints/offline_mode # Needs to avoid interference with other tests
119119

@@ -328,6 +328,14 @@ steps:
328328
- export VLLM_WORKER_MULTIPROC_METHOD=spawn
329329
- bash ./run-tests.sh -c configs/models-small.txt -t 1
330330

331+
- label: OpenAI API correctness
332+
source_file_dependencies:
333+
- csrc/
334+
- vllm/entrypoints/openai/
335+
- vllm/model_executor/models/whisper.py
336+
commands: # LMEval+Transcription WER check
337+
- pytest -s entrypoints/openai/correctness/
338+
331339
- label: Encoder Decoder tests # 5min
332340
source_file_dependencies:
333341
- vllm/

Dockerfile.rocm.ubi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
## Global Args ##################################################################
2-
ARG BASE_UBI_IMAGE_TAG=9.5-1738816775
2+
ARG BASE_UBI_IMAGE_TAG=9.5-1739420147
33
ARG PYTHON_VERSION=3.12
44
# Default ROCm ARCHes to build vLLM for.
55
ARG PYTORCH_ROCM_ARCH="gfx908;gfx90a;gfx942;gfx1100"

Dockerfile.ubi

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
## Global Args #################################################################
2-
ARG BASE_UBI_IMAGE_TAG=9.5-1738816775
2+
ARG BASE_UBI_IMAGE_TAG=9.5-1739420147
33
ARG PYTHON_VERSION=3.12
44

55
ARG TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"

docs/source/serving/openai_compatible_server.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ We currently support the following OpenAI APIs:
4141
- *Note: `parallel_tool_calls` and `user` parameters are ignored.*
4242
- [Embeddings API](#embeddings-api) (`/v1/embeddings`)
4343
- Only applicable to [embedding models](../models/pooling_models.md) (`--task embed`).
44+
- [Transcriptions API](#transcriptions-api) (`/v1/audio/transcriptions`)
45+
- Only applicable to Automatic Speech Recognition (ASR) models (OpenAI Whisper) (`--task generate`).
4446

4547
In addition, we have the following custom APIs:
4648

@@ -296,6 +298,17 @@ For chat-like input (i.e. if `messages` is passed), these extra parameters are s
296298
:end-before: end-chat-embedding-extra-params
297299
:::
298300

301+
(transcriptions-api)=
302+
303+
### Transcriptions API
304+
305+
Our Transcriptions API is compatible with [OpenAI's Transcriptions API](https://platform.openai.com/docs/api-reference/audio/createTranscription);
306+
you can use the [official OpenAI Python client](https://github.com/openai/openai-python) to interact with it.
307+
308+
<!-- TODO: api enforced limits + uploading audios -->
309+
310+
Code example: <gh-file:examples/online_serving/openai_transcription_client.py>
311+
299312
(tokenizer-api)=
300313

301314
### Tokenizer API
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
from openai import OpenAI
3+
4+
from vllm.assets.audio import AudioAsset
5+
6+
mary_had_lamb = AudioAsset('mary_had_lamb').get_local_path()
7+
winning_call = AudioAsset('winning_call').get_local_path()
8+
9+
# Modify OpenAI's API key and API base to use vLLM's API server.
10+
openai_api_key = "EMPTY"
11+
openai_api_base = "http://localhost:8000/v1"
12+
client = OpenAI(
13+
api_key=openai_api_key,
14+
base_url=openai_api_base,
15+
)
16+
with open(str(mary_had_lamb), "rb") as f:
17+
transcription = client.audio.transcriptions.create(
18+
file=f,
19+
model="openai/whisper-large-v3",
20+
language="en",
21+
response_format="text",
22+
temperature=0.0)
23+
print("transcription result:", transcription)

requirements-common.txt

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,11 @@ py-cpuinfo
88
transformers >= 4.48.2 # Required for Bamba model and Transformers backend.
99
tokenizers >= 0.19.1 # Required for Llama 3.
1010
protobuf # Required by LlamaTokenizer.
11-
fastapi >= 0.107.0, < 0.113.0; python_version < '3.9'
12-
fastapi >= 0.107.0, != 0.113.*, != 0.114.0; python_version >= '3.9'
11+
fastapi[standard] >= 0.107.0, < 0.113.0; python_version < '3.9'
12+
fastapi[standard] >= 0.107.0, != 0.113.*, != 0.114.0; python_version >= '3.9'
1313
aiohttp
1414
openai >= 1.52.0 # Ensure modern openai package (ensure types module present and max_completion_tokens field support)
15-
uvicorn[standard]
16-
pydantic >= 2.9 # Required for fastapi >= 0.113.0
15+
pydantic >= 2.9
1716
prometheus_client >= 0.18.0
1817
pillow # Required for image processing
1918
prometheus-fastapi-instrumentator >= 7.0.0

requirements-test.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ pqdm
1919
ray[adag]==2.40.0
2020
sentence-transformers # required for embedding tests
2121
soundfile # required for audio tests
22+
jiwer # required for audio tests
2223
timm # required for internvl test
2324
torch==2.5.1
2425
torchaudio==2.5.1

requirements-test.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ charset-normalizer==3.4.0
6666
click==8.1.7
6767
# via
6868
# black
69+
# jiwer
6970
# nltk
7071
# ray
7172
colorama==0.4.6
@@ -187,6 +188,8 @@ jinja2==3.1.4
187188
# via
188189
# datamodel-code-generator
189190
# torch
191+
jiwer==3.0.5
192+
# via -r requirements-test.in
190193
jmespath==1.0.1
191194
# via
192195
# boto3
@@ -470,6 +473,8 @@ pyyaml==6.0.2
470473
# timm
471474
# transformers
472475
# vocos
476+
rapidfuzz==3.12.1
477+
# via jiwer
473478
ray[adag]==2.40.0
474479
# via -r requirements-test.in
475480
redis==5.2.0

tests/entrypoints/openai/correctness/__init__.py

Whitespace-only changes.

tests/entrypoints/openai/test_accuracy.py renamed to tests/entrypoints/openai/correctness/test_lmeval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313

1414
from vllm.platforms import current_platform
1515

16-
from ...utils import RemoteOpenAIServer
16+
from ....utils import RemoteOpenAIServer
1717

1818
MODEL_NAME = "Qwen/Qwen2-1.5B-Instruct"
1919
NUM_CONCURRENT = 500

0 commit comments

Comments
 (0)