Implement OpenAI passthrough backend #105

xenoscopic · 2025-07-10T14:30:07Z

This PR implements our first passthrough backend, in this case going out to OpenAI. This is mostly an exercise in ensuring that these types of backends will fit into our architecture. A few adjustments had to be made, but otherwise things worked pretty well. A few notes:

The Backend interface had to be tweaked slightly
I've extended the BackendMode concept with a BackendModePassthrough type
I've implemented upstream model listing if using the passthrough backend, so a few handler registrations had to be relocated
Passthrough backends get loaded with model "passthrough" and mode "passthrough"; ps and unload work
- These don't take any VRAM, so they can be loaded in parallel with local models with no issues
I haven't implemented all OpenAI endpoints, only the text-based ones (so that we don't record binary responses)
- But the code can support all of them
Model configuration is ignored for this backend because context_size isn't configurable for OpenAI models and runtime flags don't map to any API concept; but other configuration would be easy to add later via the reverse proxy director

This is still in draft pending tests.

This commit adds an OpenAI passthrough backend. In order to do this, a few minor tweaks to the Backend interface were. More significantly, the OpenAI API handling had to be tweaked to allow some additional methods. The backend operates as a standard backend, but uses a placeholder model name ("passthrough") to avoid allocating one runner per OpenAI model. I've added a few more methods (most notably the rest of the chat completions API and the responses API), but not all methods yet because many of the multimodal APIs return responses that we can't record. Signed-off-by: Jacob Howard <[email protected]>

Signed-off-by: Jacob Howard <[email protected]>

xenoscopic · 2025-07-10T15:06:43Z

Some example commands to test:

curl "http://localhost:12436/engines/openai/v1/responses" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4.1",
        "input": "Write a one-sentence bedtime story about a unicorn."
    }'

curl "http://localhost:12436/engines/openai/v1/models" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Model CLI support is pending.

xenoscopic · 2025-07-10T15:13:08Z

@doringeman I'd like to add support for other API endpoints (e.g. audio and images). The code here can handle it, but I don't think we want to record those responses (since they could be big), so I've intentionally avoided registering those endpoints. I'm thinking we only record responses if in a text-based mode. WDYT?

xenoscopic · 2025-07-10T15:17:38Z

@ilopezluna Just reminded me that you can already do audio via the completions endpoint (and I'd assume images too), so maybe we should adjust the recorder to avoid capturing that.

ilopezluna · 2025-07-11T07:36:16Z

pkg/inference/backend.go

+	// that acts as a proxy for inference infrastructure that's managed outside
+	// of the model runner. This also implies that the backend uses external
+	// model management.
+	Passthrough() bool


We currently use this function to determine whether the backend is "passthrough," and based on that, we perform either A or B. This works perfectly fine for now. However, if we introduce a new type of backend in the future, we might need to add a new isWhatever() function that returns false for all cases except the new one.
I was wondering if it might make sense to use a function that returns a backend type instead, something like Type() BackendType. No need to change anything right now, I’m just sharing the thought in case we end up adding another backend. It might make future refactoring easier.

It's a good though. I'm not sure how many "types" we'll end up with, so I agree, let's wait. I was also thinking maybe backends should support some sort of SupportsMode(mode BackendMode) method to control which APIs get routed to them; maybe that could be done simultaneously.

pkg/inference/backends/openai/openai.go

doringeman

LGTM!

I missed the initial discussion about this, but shouldn't we allow configuring an upstream URL other than https://api.openai.com/v1/?
E.g., https://generativelanguage.googleapis.com/v1beta/openai
(https://ai.google.dev/gemini-api/docs/openai#rest)
Perhaps via a custom X-Upstream-URL HTTP header.
Of course this would be for later, not for this initial PR.

pkg/inference/scheduling/scheduler.go

p1-0tr

LGTM

Co-authored-by: Ignasi <[email protected]>

Co-authored-by: Dorin-Andrei Geman <[email protected]>

xenoscopic · 2025-07-11T12:02:44Z

@doringeman it's a good question re: other URLs. I had thought about maybe making this a more general passthrough backend (since there's very little here specific to OpenAI). It should be an easy lift - the most critical part is maybe the Bearer token, but I assume almost all of the implementations out there uses bearer tokens these days. We can consider it before shipping, definitely. In that case, maybe we don't even need a Passthrough() bool method and we could just do a type assertion to see if it's a type Passthrough struct {url string} backend.

Signed-off-by: Jacob Howard <[email protected]>

doringeman

Thanks!

xenoscopic · 2025-07-30T07:55:27Z

Closing since we're going to take a slightly different approach. I'll leave the branch intact for now.

Skip copy config on DD local and cloud

xenoscopic added 3 commits July 10, 2025 18:06

Don't attempt to log nil runner configurations.

c05842c

Signed-off-by: Jacob Howard <[email protected]>

Include all backends' disk usage in total.

e9f3b2f

Signed-off-by: Jacob Howard <[email protected]>

xenoscopic force-pushed the openai-passthrough branch from df6d3c0 to e9f3b2f Compare July 10, 2025 15:06

xenoscopic requested review from doringeman and ilopezluna July 10, 2025 15:15

ilopezluna reviewed Jul 11, 2025

View reviewed changes

pkg/inference/backends/openai/openai.go Outdated Show resolved Hide resolved

doringeman approved these changes Jul 11, 2025

View reviewed changes

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

pkg/inference/scheduling/scheduler.go Outdated Show resolved Hide resolved

p1-0tr approved these changes Jul 11, 2025

View reviewed changes

xenoscopic and others added 2 commits July 11, 2025 05:57

Update pkg/inference/backends/openai/openai.go

fb9e1b8

Co-authored-by: Ignasi <[email protected]>

Update pkg/inference/scheduling/scheduler.go

74c9839

Co-authored-by: Dorin-Andrei Geman <[email protected]>

Move passthrough model name to constant and clarify error message.

d370bbd

Signed-off-by: Jacob Howard <[email protected]>

doringeman approved these changes Jul 11, 2025

View reviewed changes

doringeman mentioned this pull request Jul 14, 2025

Backend flag support docker/model-cli#126

Merged

xenoscopic force-pushed the openai-passthrough branch from 7d5932e to d370bbd Compare July 18, 2025 08:54

xenoscopic closed this Jul 30, 2025

doringeman pushed a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025

Merge pull request docker#105 from docker/skip-copy-config-for-dd

c5fea3a

Skip copy config on DD local and cloud

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement OpenAI passthrough backend #105

Implement OpenAI passthrough backend #105

Uh oh!

xenoscopic commented Jul 10, 2025 •

edited

Loading

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

ilopezluna Jul 11, 2025

Uh oh!

xenoscopic Jul 11, 2025

Uh oh!

Uh oh!

doringeman left a comment

Uh oh!

Uh oh!

Uh oh!

p1-0tr left a comment

Uh oh!

xenoscopic commented Jul 11, 2025

Uh oh!

doringeman left a comment

Uh oh!

xenoscopic commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement OpenAI passthrough backend #105

Implement OpenAI passthrough backend #105

Uh oh!

Conversation

xenoscopic commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

xenoscopic commented Jul 10, 2025

Uh oh!

ilopezluna Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

xenoscopic Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

doringeman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

p1-0tr left a comment

Choose a reason for hiding this comment

Uh oh!

xenoscopic commented Jul 11, 2025

Uh oh!

doringeman left a comment

Choose a reason for hiding this comment

Uh oh!

xenoscopic commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xenoscopic commented Jul 10, 2025 •

edited

Loading