feat: OpenAI-compatible API Endpoints for Embedding Models #104

yinggeh · 2025-10-30T01:08:11Z

Enable vLLM to load embedding model and execute embedding requests

Co-authored-by: Yingge He <[email protected]>

src/utils/request.py

…end into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

whoisj

left a few comments.

whoisj · 2025-10-30T21:21:36Z

src/utils/request.py

+            pooling_params = PoolingParams(dimensions=dims, task="embed")
+        return pooling_params
+
+    def create_response(self, request_output):


would be nice to have a type hint on request_output.

whoisj · 2025-10-30T21:22:15Z

src/utils/request.py

+        async for response in response_iterator:
+            yield response
+
+    def create_response(self, request_output_state, request_output, prepend_input):


would be nice to have type hints on request_output_state, request_output, and prepend_input.

whoisj · 2025-10-30T21:22:58Z

src/utils/request.py

+
+
+class RequestBase:
+    def __init__(self, request, executor_callback, output_dtype):


would be nice to have type hints on request, executor_callback, and output_dtype.

whoisj · 2025-10-30T21:24:18Z

src/utils/request.py

+from abc import abstractmethod
+from io import BytesIO
+
+import numpy as np


is using numpy (CPU) good enough?

do we want to leverage cupy (GPU)?

My understanding is that vLLM engine takes care of it.

pskiran1 · 2025-11-03T15:53:25Z

src/model.py

                "optional": True,
            },
+            # Tentative input reserved for embedding requests in OpenAI-compatible frontend. Subject to change in the future.
+            # WARN: Triton client should never set this input. It is reserved for embedding requests in OpenAI-compatible frontend.


Why limit support to only the OpenAI frontend? Maybe we should also allow deploying embedding models using only the vLLM backend?

That's a separate issue. The input prompt format of chat/completion is different from embeddings. We need two different sets of configuration inputs for generate and embed.

…d into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh and others added 4 commits October 21, 2025 11:55

ci: Update backend and tests to V1 engine (#100)

e13fead

fix: Update environment configuration (#102)

460bd45

test: Allow more time for cold start on SBSA (#101)

f41cfdb

make utilization lower (#103)

d50bda1

Co-authored-by: Yingge He <[email protected]>

yinggeh requested review from oandreeva-nv, pskiran1 and whoisj October 30, 2025 01:08

yinggeh self-assigned this Oct 30, 2025

yinggeh added the enhancement New feature or request label Oct 30, 2025

yinggeh mentioned this pull request Oct 30, 2025

feat: OpenAI-compatible API Endpoints for Embedding Models triton-inference-server/server#8483

Open

11 tasks

github-advanced-security bot found potential problems Oct 30, 2025

View reviewed changes

src/utils/request.py Fixed Show fixed Hide fixed

yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 0acb76f to 7d06043 Compare October 30, 2025 01:12

Support embedding endpoint in OpenAI API frontend

2c3e148

yinggeh force-pushed the yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton branch from 7d06043 to 2c3e148 Compare October 30, 2025 01:14

Merge branch 'r25.10' of github.com:triton-inference-server/vllm_back…

961a7c3

…end into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

whoisj reviewed Oct 30, 2025

View reviewed changes

Address comment and rebase to r25.10 (V1 API)

943ee5f

yinggeh changed the base branch from main to r25.10 October 30, 2025 22:46

Add warning

6bd56c4

pskiran1 reviewed Nov 3, 2025

View reviewed changes

Merge branch 'main' of github.com:triton-inference-server/vllm_backen…

73a8538

…d into yinggeh/tri-49-request-for-openai-compatible-api-endpoints-for-triton

yinggeh changed the base branch from r25.10 to main November 3, 2025 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: OpenAI-compatible API Endpoints for Embedding Models #104

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Uh oh!

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

whoisj left a comment

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

whoisj Oct 30, 2025

Uh oh!

yinggeh Oct 30, 2025

Uh oh!

pskiran1 Nov 3, 2025 •

edited

Loading

Uh oh!

yinggeh Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants



		class RequestBase:
		def __init__(self, request, executor_callback, output_dtype):

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Are you sure you want to change the base?

feat: OpenAI-compatible API Endpoints for Embedding Models #104

Uh oh!

Conversation

yinggeh commented Oct 30, 2025

Uh oh!

Uh oh!

whoisj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pskiran1 Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pskiran1 Nov 3, 2025 •

edited

Loading