server : implement /api/version endpoint for ollama compatibility (#15167 ) #15177

albert-polak · 2025-08-08T14:18:34Z

This PR implements a minimal /api/version endpoint to make llama.cpp compatible with tools expecting the Ollama API, such as the copilot chat VS Code extension.

Fixes #15167

65a · 2025-08-09T02:21:33Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

Green-Sky · 2025-08-09T08:40:59Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

I think so too.
I also don't really see the point in faking/pretending to be ollama by default.

albert-polak · 2025-08-09T11:32:43Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

I think so too. I also don't really see the point in faking/pretending to be ollama by default.

yeah, that is probably a better idea, but the llama.cpp versioning convention probably doesn't comply with the ollama one.

Would you suggest spliting it manually by inserting dots?

65a · 2025-08-11T03:35:23Z

In the example, I'd return 6121 unless overriden, I guess.

albert-polak · 2025-08-11T07:56:25Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

I think so too. I also don't really see the point in faking/pretending to be ollama by default.

yeah, that is probably a better idea, but the llama.cpp versioning convention probably doesn't comply with the ollama one.
Would you suggest spliting it manually by inserting dots?

It actually does comply with the ollama versioning, it is treated as 6121.0.0. Commited some changes.

ngxson · 2025-08-11T09:16:58Z

If it's purely for compatibility, why don't we hard-code the version number to something like 99.99.99.99 ?

Tbh I don't feel confident spending a lot of code just to match a short-lived integration. VSCode will eventually has OAI-compat support, the ollama-compat is currently a short-term solution.

ngxson · 2025-08-11T09:18:05Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

What's the use case? Does nay downstream app check for this version? And even if it checks, does an incorrect version number blocks you from doing certain things?

albert-polak · 2025-08-11T09:59:50Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

What's the use case? Does nay downstream app check for this version? And even if it checks, does an incorrect version number blocks you from doing certain things?

That's exactly right, if the endpoint isn't there the vs code copilot chat extension can't get the model list due to a certain commit (linked in the issue #15167 ). It's connected to this PR #12896. But just returning llama cpp build version works as I commented above. It treats it as 6121.0.0 which won't ever be surpassed I think

Green-Sky · 2025-08-11T10:02:25Z

ggml presents it's version as 0.0.xxxx.

albert-polak · 2025-08-12T13:14:46Z

ggml presents it's version as 0.0.xxxx.

Llama build version as in ./llama-cli --version is being treated as "build_version.0.0"

65a · 2025-08-15T00:50:33Z

Drive-by comment, not an approver. Maybe we should return the actual llama.cpp version on this endpoint, and have a generic LLAMA_API_VERSION_OVERRIDE env var for cases where it's necessary to return specific values?

What's the use case? Does nay downstream app check for this version? And even if it checks, does an incorrect version number blocks you from doing certain things?

Not sure, but ollama fake version by default seems wrong. I guess it might be useful if there are chat template problems, or inference quirks that a client might want to or need to work around...I could see old versions floating of server floating around and causing confusion, and this way the client could warn the user.

+1 to the best solution being just fixing the client to understand the llama.cpp API in the first place for the particular case.

albert-polak · 2025-08-18T08:39:40Z

ess it might be useful if there are chat template problems, or inference quirks that a client might want to or need to work around...I could see old versions floating of server floating around and causing confusion, and this way the

Closing then. Maybe I will post an issue on copilot chats github

rombert · 2025-09-03T09:56:03Z

I just ran into this and I think it would be great to see this bug fixed to have proper VS Code Chat support again. I think that there are benefits to implementing the ollama API because there is already tooling out there supporting it, rather than asking them to support llama.cpp natively.

I think the current implementation works well and it would be a win to see it merged.

ggerganov · 2025-09-03T10:04:14Z

This is not a bug and there is nothing to fix. There are no benefits in implementing the ollama API. VS Code will soon do the right thing and support custom endpoints (microsoft/vscode#249605)

albert-polak · 2025-09-03T10:08:07Z

That's great :) I'm also wondering if they will allow for a completely offline workflow. I read somewhere that copilots extension sends the whole prompt to microsofts servers despite doing the llm inference locally.

rombert · 2025-09-03T10:09:28Z

This is not a bug and there is nothing to fix. There are no benefits in implementing the ollama API. VS Code will soon do the right thing and support custom endpoints (microsoft/vscode#249605)

I was not aware of that, this is good news, thanks @ggerganov

adding a fake ollama version endpoint

84bcf50

albert-polak requested a review from ngxson as a code owner August 8, 2025 14:18

github-actions bot added examples server labels Aug 8, 2025

apply suggestions from comments

5589224

albert-polak closed this Aug 18, 2025

abrimogard mentioned this pull request Aug 20, 2025

Misc. bug: ggml_cuda_compute_forward: MUL_MAT failed CUDA error: device kernel image is invalid #15452

Closed

ggerganov mentioned this pull request Oct 23, 2025

vscode copilot chat compatibility #16733

Closed

server : implement /api/version endpoint for ollama compatibility (#15167 ) #15177

server : implement /api/version endpoint for ollama compatibility (#15167 ) #15177

Uh oh!

Conversation

albert-polak commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

65a commented Aug 9, 2025

Uh oh!

Green-Sky commented Aug 9, 2025

Uh oh!

albert-polak commented Aug 9, 2025

Uh oh!

65a commented Aug 11, 2025

Uh oh!

albert-polak commented Aug 11, 2025

Uh oh!

ngxson commented Aug 11, 2025

Uh oh!

ngxson commented Aug 11, 2025

Uh oh!

albert-polak commented Aug 11, 2025

Uh oh!

Green-Sky commented Aug 11, 2025

Uh oh!

albert-polak commented Aug 12, 2025

Uh oh!

65a commented Aug 15, 2025

Uh oh!

albert-polak commented Aug 18, 2025

Uh oh!

rombert commented Sep 3, 2025

Uh oh!

ggerganov commented Sep 3, 2025

Uh oh!

albert-polak commented Sep 3, 2025

Uh oh!

rombert commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

albert-polak commented Aug 8, 2025 •

edited

Loading