-
Notifications
You must be signed in to change notification settings - Fork 7.2k
ollama_oss: default WireApi to Responses, fall back to Chat #8227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@drifkin, thanks for the PR. There were some merge conflicts. I attempted to address them in the github merge UI, but I apparently missed something because I'm seeing build errors. Please pull the latest changes and fix them up on your branch (or have codex do it) and push the change. |
In Ollama v0.13.4 `/v1/responses` support was added. When Ollama is used via `--oss`, it now defaults to Responses. At runtime we do a version check and if the version is older than when Responses support was added, we fall back to `/v1/chat/completions`.
4c088dd to
21c37a1
Compare
|
no worries! I think when I was originally making this change I just happened to be 30m away from another config change that conflicted a bit. I had codex rebase it against main, and have just pushed up a new version. |
Previously we were only downgrading to chat for older `ollama`s when using `--oss`, but you can also use the built-in ollama provider via `model_provider = "ollama"` in your config
21c37a1 to
d6d0347
Compare
|
@drifkin, looks like there are formatting issues. A quick |
oops, had those formatting changes staged, but not committed. Just pushed them up as well |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ("openai", P::create_openai_provider()), | ||
| ( | ||
| OLLAMA_OSS_PROVIDER_ID, | ||
| create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Chat), | ||
| create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Responses), | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure Ollama fallback runs in all entrypoints
Switching the built‑in Ollama provider default to WireApi::Responses means any binary that loads a Config but does not call detect_ollama_wire_api_if_needed will now send /v1/responses to older Ollama servers (<0.13.4) and fail instead of falling back to /v1/chat/completions. I checked codex-rs/app-server/src/lib.rs and codex-rs/mcp-server/src/lib.rs (and several CLI subcommands) and they load config without the new detection call, so those entrypoints regress while exec/tui do not. Consider moving the detection into a shared config loading path or wiring it into other entrypoints so all Ollama uses get the same compatibility fallback.
Useful? React with 👍 / 👎.
codex-rs/exec/src/lib.rs
Outdated
| Config::load_with_cli_overrides_and_harness_overrides(cli_kv_overrides, overrides).await?; | ||
|
|
||
| if !oss { | ||
| detect_ollama_wire_api_if_needed(&mut config).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be honest: I don't love this idea that we have to know to mutate Config after it is constructed in these cases. Ideas:
- If there is an issue where we detect a runtime mismatch that the user can fix by modifying their
config.toml, could we return an error that asks them to do that? - Should we move this logic inside
Config's constructor so we don't have to post-process this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't super love this either, I'll look into doing the constructor change (though is it surprising to have the constructor make a network call?). I think the main motivation behind doing this kind of involved change is I don't know how people can edit any of the built-in ollama flows to change the wire format in config.toml (either via --oss or the model_provider = "ollama" path, the latter of which I learned from codex's review above). Do you know if that's even currently possible?
And the intent is to remove all of this code once the chat completions wire format is hard deprecated
|
@bolinfest: thanks for the review, I can make those code changes tomorrow. I think the main point of conversation is around configuring built-in providers (I might just not understand how they interact with the |
|
It appears that the failing check had an error that was fixed in the lastest release. Might be worth re-running the checks. |
bolinfest
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drifkin OK, so we still haven't addressed the fundamental issue, which is that now Config is mut in a bunch of places where it really should not be.
One of my original suggestions was:
If there is an issue where we detect a runtime mismatch that the user can fix by modifying their config.toml, could we return an error that asks them to do that?
and you rightfully pointed out that this is a bit tricky given how the built-in ollama provider created by built_in_model_providers() works today.
What if we took a different approach where:
- The built-in
ollamais updated to assume the Responses API by default. - We introduce an additional built-in provider,
ollama-chat, that is the same asollamaexcept it uses the chat wire format. - At runtime, if someone tries using
ollamaand the server doesn't support the Responses API, we error and tell them to setmodel_provider = "ollama-chat"inconfig.toml, but maybe also point them to #7782 that chat API support is going away in Codex in February 2026.
| if *version == Version::new(0, 0, 0) { | ||
| return WireApi::Responses; | ||
| } | ||
| if *version >= min_responses_version() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if *version == Version::new(0, 0, 0) { | |
| return WireApi::Responses; | |
| } | |
| if *version >= min_responses_version() { | |
| if *version == Version::new(0, 0, 0) { | |
| WireApi::Responses | |
| } else if *version >= min_responses_version() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though this should probably be combined with ||?
I like this! I'll try out a new PR with that approach. |
This is an alternate PR to solving the same problem as <openai#8227>. In this PR, when Ollama is used via `--oss` (or via `model_provider = "ollama`), we default it to use the Responses format. At runtime, we do an Ollama version check, and if the version is older than when Responses support was added to Ollama, we print out a warning. Because there's no way of configuring the wire api for a built-in provider, we temporarily add a new `oss_provider`/`model_provider` called `"ollama-chat"` that will force the chat format. Once the `"chat"` format is fully removed (see <openai#7782>), `ollama-chat` can be removed as well
This is an alternate PR to solving the same problem as <openai#8227>. In this PR, when Ollama is used via `--oss` (or via `model_provider = "ollama`), we default it to use the Responses format. At runtime, we do an Ollama version check, and if the version is older than when Responses support was added to Ollama, we print out a warning. Because there's no way of configuring the wire api for a built-in provider, we temporarily add a new `oss_provider`/`model_provider` called `"ollama-chat"` that will force the chat format. Once the `"chat"` format is fully removed (see <openai#7782>), `ollama-chat` can be removed as well
@bolinfest: here's a new PR that implements the |
|
Closing this PR in favor of #8798. |
In Ollama v0.13.4
/v1/responsessupport was added. When Ollama is used via--oss, it now defaults to Responses. At runtime we do a version check and if the version is older than when Responses support was added, we fall back to/v1/chat/completions.