Skip to content

Conversation

@xenoscopic
Copy link
Contributor

@xenoscopic xenoscopic commented Jul 10, 2025

This PR implements our first passthrough backend, in this case going out to OpenAI. This is mostly an exercise in ensuring that these types of backends will fit into our architecture. A few adjustments had to be made, but otherwise things worked pretty well. A few notes:

  • The Backend interface had to be tweaked slightly
  • I've extended the BackendMode concept with a BackendModePassthrough type
  • I've implemented upstream model listing if using the passthrough backend, so a few handler registrations had to be relocated
  • Passthrough backends get loaded with model "passthrough" and mode "passthrough"; ps and unload work
    • These don't take any VRAM, so they can be loaded in parallel with local models with no issues
  • I haven't implemented all OpenAI endpoints, only the text-based ones (so that we don't record binary responses)
    • But the code can support all of them
  • Model configuration is ignored for this backend because context_size isn't configurable for OpenAI models and runtime flags don't map to any API concept; but other configuration would be easy to add later via the reverse proxy director

This is still in draft pending tests.

This commit adds an OpenAI passthrough backend. In order to do this, a
few minor tweaks to the Backend interface were. More significantly, the
OpenAI API handling had to be tweaked to allow some additional methods.
The backend operates as a standard backend, but uses a placeholder model
name ("passthrough") to avoid allocating one runner per OpenAI model.

I've added a few more methods (most notably the rest of the chat
completions API and the responses API), but not all methods yet because
many of the multimodal APIs return responses that we can't record.

Signed-off-by: Jacob Howard <[email protected]>
@xenoscopic xenoscopic force-pushed the openai-passthrough branch from df6d3c0 to e9f3b2f Compare July 10, 2025 15:06
@xenoscopic
Copy link
Contributor Author

Some example commands to test:

curl "http://localhost:12436/engines/openai/v1/responses" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -d '{
        "model": "gpt-4.1",
        "input": "Write a one-sentence bedtime story about a unicorn."
    }'
curl "http://localhost:12436/engines/openai/v1/models" \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Model CLI support is pending.

@xenoscopic
Copy link
Contributor Author

@doringeman I'd like to add support for other API endpoints (e.g. audio and images). The code here can handle it, but I don't think we want to record those responses (since they could be big), so I've intentionally avoided registering those endpoints. I'm thinking we only record responses if in a text-based mode. WDYT?

@xenoscopic
Copy link
Contributor Author

@ilopezluna Just reminded me that you can already do audio via the completions endpoint (and I'd assume images too), so maybe we should adjust the recorder to avoid capturing that.

// that acts as a proxy for inference infrastructure that's managed outside
// of the model runner. This also implies that the backend uses external
// model management.
Passthrough() bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We currently use this function to determine whether the backend is "passthrough," and based on that, we perform either A or B. This works perfectly fine for now. However, if we introduce a new type of backend in the future, we might need to add a new isWhatever() function that returns false for all cases except the new one.
I was wondering if it might make sense to use a function that returns a backend type instead, something like Type() BackendType. No need to change anything right now, I’m just sharing the thought in case we end up adding another backend. It might make future refactoring easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good though. I'm not sure how many "types" we'll end up with, so I agree, let's wait. I was also thinking maybe backends should support some sort of SupportsMode(mode BackendMode) method to control which APIs get routed to them; maybe that could be done simultaneously.

Copy link
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I missed the initial discussion about this, but shouldn't we allow configuring an upstream URL other than https://api.openai.com/v1/?
E.g., https://generativelanguage.googleapis.com/v1beta/openai
(https://ai.google.dev/gemini-api/docs/openai#rest)
Perhaps via a custom X-Upstream-URL HTTP header.
Of course this would be for later, not for this initial PR.

Copy link

@p1-0tr p1-0tr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xenoscopic
Copy link
Contributor Author

@doringeman it's a good question re: other URLs. I had thought about maybe making this a more general passthrough backend (since there's very little here specific to OpenAI). It should be an easy lift - the most critical part is maybe the Bearer token, but I assume almost all of the implementations out there uses bearer tokens these days. We can consider it before shipping, definitely. In that case, maybe we don't even need a Passthrough() bool method and we could just do a type assertion to see if it's a type Passthrough struct {url string} backend.

Copy link
Contributor

@doringeman doringeman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@xenoscopic
Copy link
Contributor Author

Closing since we're going to take a slightly different approach. I'll leave the branch intact for now.

@xenoscopic xenoscopic closed this Jul 30, 2025
doringeman pushed a commit to doringeman/model-runner that referenced this pull request Oct 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants