Skip to content

Conversation

@drifkin
Copy link

@drifkin drifkin commented Dec 18, 2025

In Ollama v0.13.4 /v1/responses support was added. When Ollama is used via --oss, it now defaults to Responses. At runtime we do a version check and if the version is older than when Responses support was added, we fall back to /v1/chat/completions.

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@etraut-openai
Copy link
Collaborator

@drifkin, thanks for the PR. There were some merge conflicts. I attempted to address them in the github merge UI, but I apparently missed something because I'm seeing build errors. Please pull the latest changes and fix them up on your branch (or have codex do it) and push the change.

@etraut-openai etraut-openai added the needs-response Additional information is requested label Dec 18, 2025
In Ollama v0.13.4 `/v1/responses` support was added. When Ollama is used
via `--oss`, it now defaults to Responses. At runtime we do a version
check and if the version is older than when Responses support was added,
we fall back to `/v1/chat/completions`.
@drifkin drifkin force-pushed the drifkin/ollama-oss-responses branch from 4c088dd to 21c37a1 Compare December 19, 2025 00:16
@drifkin
Copy link
Author

drifkin commented Dec 19, 2025

no worries! I think when I was originally making this change I just happened to be 30m away from another config change that conflicted a bit. I had codex rebase it against main, and have just pushed up a new version.

Previously we were only downgrading to chat for older `ollama`s when
using `--oss`, but you can also use the built-in ollama provider via
`model_provider = "ollama"` in your config
@drifkin drifkin force-pushed the drifkin/ollama-oss-responses branch from 21c37a1 to d6d0347 Compare December 19, 2025 00:19
@etraut-openai
Copy link
Collaborator

@drifkin, looks like there are formatting issues. A quick just fmt should clean it up.

@drifkin
Copy link
Author

drifkin commented Dec 19, 2025

@drifkin, looks like there are formatting issues. A quick just fmt should clean it up.

oops, had those formatting changes staged, but not committed. Just pushed them up as well

@etraut-openai etraut-openai removed the needs-response Additional information is requested label Dec 19, 2025
@etraut-openai
Copy link
Collaborator

@codex review

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 273 to 277
("openai", P::create_openai_provider()),
(
OLLAMA_OSS_PROVIDER_ID,
create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Chat),
create_oss_provider(DEFAULT_OLLAMA_PORT, WireApi::Responses),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ensure Ollama fallback runs in all entrypoints

Switching the built‑in Ollama provider default to WireApi::Responses means any binary that loads a Config but does not call detect_ollama_wire_api_if_needed will now send /v1/responses to older Ollama servers (<0.13.4) and fail instead of falling back to /v1/chat/completions. I checked codex-rs/app-server/src/lib.rs and codex-rs/mcp-server/src/lib.rs (and several CLI subcommands) and they load config without the new detection call, so those entrypoints regress while exec/tui do not. Consider moving the detection into a shared config loading path or wiring it into other entrypoints so all Ollama uses get the same compatibility fallback.

Useful? React with 👍 / 👎.

Config::load_with_cli_overrides_and_harness_overrides(cli_kv_overrides, overrides).await?;

if !oss {
detect_ollama_wire_api_if_needed(&mut config).await;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be honest: I don't love this idea that we have to know to mutate Config after it is constructed in these cases. Ideas:

  • If there is an issue where we detect a runtime mismatch that the user can fix by modifying their config.toml, could we return an error that asks them to do that?
  • Should we move this logic inside Config's constructor so we don't have to post-process this here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't super love this either, I'll look into doing the constructor change (though is it surprising to have the constructor make a network call?). I think the main motivation behind doing this kind of involved change is I don't know how people can edit any of the built-in ollama flows to change the wire format in config.toml (either via --oss or the model_provider = "ollama" path, the latter of which I learned from codex's review above). Do you know if that's even currently possible?

And the intent is to remove all of this code once the chat completions wire format is hard deprecated

@drifkin
Copy link
Author

drifkin commented Dec 19, 2025

@bolinfest: thanks for the review, I can make those code changes tomorrow. I think the main point of conversation is around configuring built-in providers (I might just not understand how they interact with the config.toml)

@etraut-openai etraut-openai added the needs-response Additional information is requested label Dec 19, 2025
@etraut-openai etraut-openai removed the needs-response Additional information is requested label Dec 25, 2025
@devdev-automation
Copy link

It appears that the failing check had an error that was fixed in the lastest release. Might be worth re-running the checks.

EmbarkStudios/cargo-deny-action#101

Copy link
Collaborator

@bolinfest bolinfest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drifkin OK, so we still haven't addressed the fundamental issue, which is that now Config is mut in a bunch of places where it really should not be.

One of my original suggestions was:

If there is an issue where we detect a runtime mismatch that the user can fix by modifying their config.toml, could we return an error that asks them to do that?

and you rightfully pointed out that this is a bit tricky given how the built-in ollama provider created by built_in_model_providers() works today.

What if we took a different approach where:

  • The built-in ollama is updated to assume the Responses API by default.
  • We introduce an additional built-in provider, ollama-chat, that is the same as ollama except it uses the chat wire format.
  • At runtime, if someone tries using ollama and the server doesn't support the Responses API, we error and tell them to set model_provider = "ollama-chat" in config.toml, but maybe also point them to #7782 that chat API support is going away in Codex in February 2026.

Comment on lines +62 to +65
if *version == Version::new(0, 0, 0) {
return WireApi::Responses;
}
if *version >= min_responses_version() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if *version == Version::new(0, 0, 0) {
return WireApi::Responses;
}
if *version >= min_responses_version() {
if *version == Version::new(0, 0, 0) {
WireApi::Responses
} else if *version >= min_responses_version() {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though this should probably be combined with ||?

@drifkin
Copy link
Author

drifkin commented Jan 6, 2026

What if we took a different approach where:

  • The built-in ollama is updated to assume the Responses API by default.
  • We introduce an additional built-in provider, ollama-chat, that is the same as ollama except it uses the chat wire format.
  • At runtime, if someone tries using ollama and the server doesn't support the Responses API, we error and tell them to set model_provider = "ollama-chat" in config.toml, but maybe also point them to Deprecating `chat/completions` support in Codex #7782 that chat API support is going away in Codex in February 2026.

I like this! I'll try out a new PR with that approach.

drifkin added a commit to drifkin/codex that referenced this pull request Jan 6, 2026
This is an alternate PR to solving the same problem as
<openai#8227>.

In this PR, when Ollama is used via `--oss` (or via
`model_provider = "ollama`), we default it to use the Responses format.
At runtime, we do an Ollama version check, and if the version is older
than when Responses support was added to Ollama, we print out a warning.

Because there's no way of configuring the wire api for a built-in
provider, we temporarily add a new `oss_provider`/`model_provider`
called `"ollama-chat"` that will force the chat format.

Once the `"chat"` format is fully removed (see
<openai#7782>), `ollama-chat` can
be removed as well
drifkin added a commit to drifkin/codex that referenced this pull request Jan 6, 2026
This is an alternate PR to solving the same problem as
<openai#8227>.

In this PR, when Ollama is used via `--oss` (or via
`model_provider = "ollama`), we default it to use the Responses format.
At runtime, we do an Ollama version check, and if the version is older
than when Responses support was added to Ollama, we print out a warning.

Because there's no way of configuring the wire api for a built-in
provider, we temporarily add a new `oss_provider`/`model_provider`
called `"ollama-chat"` that will force the chat format.

Once the `"chat"` format is fully removed (see
<openai#7782>), `ollama-chat` can
be removed as well
@drifkin
Copy link
Author

drifkin commented Jan 6, 2026

What if we took a different approach where:

  • The built-in ollama is updated to assume the Responses API by default.
  • We introduce an additional built-in provider, ollama-chat, that is the same as ollama except it uses the chat wire format.
  • At runtime, if someone tries using ollama and the server doesn't support the Responses API, we error and tell them to set model_provider = "ollama-chat" in config.toml, but maybe also point them to Deprecating chat/completions support in Codex #7782 that chat API support is going away in Codex in February 2026.

I like this! I'll try out a new PR with that approach.

@bolinfest: here's a new PR that implements the ollama-chat provider approach: #8798

@etraut-openai
Copy link
Collaborator

Closing this PR in favor of #8798.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants