Releases · thushan/olla

New Backends: Docker Model Runner and vLLM-MLX
Support for Anthropic Passthrough on supported backends (vllm etc) so we don't translate in Olla
Documentation Refinements based on feedback
Sensible defaults so you can have a lean config file to overide for most users
BUGFIX: Proxy Path problems (/olla/proxy) resolution issues when certain mixes of backends were present
Additional tests for integration and pass through (python) for internal verification before shipping
Security & dependency updates

What's Changed

Bump actions/cache from 4 to 5 by @dependabot[bot] in #91
chore: february 2026 dependency updates + CI fixes by @thushan in #101
docs: February 2026 updates by @thushan in #102
feat: endpoint optional by @thushan in #103
Bump github.com/expr-lang/expr from 1.17.6 to 1.17.7 by @dependabot[bot] in #93
feat: Anthropic Pass-through by @thushan in #105
feat: backend/docker-model-runner by @thushan in #106
fix: Proxy Path issue & sensible defaults by @thushan in #107
fix: pass through failure by @thushan in #108
feature: backend/vllm-mlx by @thushan in #109
docs: vllm-mlx by @thushan in #110
feat: python integration tests by @thushan in #112
Bump github.com/expr-lang/expr from 1.17.7 to 1.17.8 by @dependabot[bot] in #111

Full Changelog: v0.0.22...v0.0.23

Contributors

thushan and dependabot

Assets 11

15 Dec 10:48

github-actions

v0.0.22

083ff79

olla-v0.0.22

This release was largely for fixing model_url resolution but also contains some maintenance fixes.

What's Changed

Bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #83
fix: ensure model_url is used from endpoint config by @thushan in #88
chore: december 2025 by @thushan in #89
Bump actions/checkout from 5 to 6 by @dependabot[bot] in #85
fix: Alternative method of resolving profile paths by @thushan in #90

Full Changelog: v0.0.21...v0.0.22

Changelog

6bf7c45 Bump actions/checkout from 5 to 6
83fbc91 Bump golang.org/x/sync from 0.17.0 to 0.18.0
e2b4351 alternative way to join paths for OpenAI compatible profiles
eebe4f1 copy paste issue
b47df2f ensure that model_url is used from endpoint config and fallback is the profile.
a6f8af2 feedback
1a3325a format
499a154 handle absolute URLs a bit better and expand test cases
72ddfcd lib update & validation
7428298 small refactor
fd2733d update doc 4

Contributors

thushan and dependabot

Assets 11

06 Nov 10:30

github-actions

v0.0.21

fd8418d

olla-v0.0.21

This release is to help address path translation issues (see #80) with tools like Docker Model Runner and Olla's default way of handling paths and not preserving paths.

We added a new setting for endpoints to instruct Olla to preserve paths when proxying requests.

      - url: "http://localhost:12434/engines/llama.cpp/"
        name: "local-docker"
        type: "openai-compatible"
        priority: 100
        preserve_path: true # this way, /v1/completions will forward properly to Docker Model Runner
        model_url: "/models"
        health_check_url: "/"
        check_interval: 2s
        check_timeout: 1s

There's also a bugfix for the missing OpenAI routing (for type: openai-compatible) with refreshed profiles for OpenAI.

What's Changed

feat: path preservation for routing in Olla by @thushan in #81

Full Changelog: v0.0.20...v0.0.21

Changelog

f523393 add preserve_path to ep configuration
0cca5e4 initial profile consolidation
3fc883c introduce url_builder to abstract out the target path building
de17e8a lint issues
971efc6 update doc
2538316 update doc 2
fd8418d update doc 3
43756ee update docs

Contributors

thushan

Assets 11

22 Oct 08:14

github-actions

v0.0.20

df959f0

olla-v0.0.20

This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.

What's Changed

feat: Backend llamacpp by @thushan in #73
feat: anthropic / message logger (development only) by @thushan in #77
feat: Anthropic Message format Support by @thushan in #76
Bump github.com/pterm/pterm from 0.12.81 to 0.12.82 by @dependabot[bot] in #75
Bump golang.org/x/time from 0.13.0 to 0.14.0 by @dependabot[bot] in #72
prepare: v0.0.20 by @thushan in #78

Full Changelog: v0.0.19...v0.0.20

Contributors

thushan and dependabot

Assets 11

09 Oct 23:40

github-actions

v0.0.19

1b9ffd6

olla-v0.0.19

This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.

Encourage all to upgrade to this release.

What's Changed

feat: backend/sglang by @thushan in #69
feat: backend/lemonade by @thushan in #70
fixes: October 2025 performance improvements by @thushan in #71

Full Changelog: v0.0.18...v0.0.19

Changelog

554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
4d3e12d adds parser
dcf3c52 adds the parser and converter
267dcd2 atomic catalog store
716e57f avoid alloc on response times
203ce4a cleanup
9cb11c9 constants for linting, will add more later
c7a7fc9 doc refresh
7aeb09f documentation
6748a50 documentation updates
c688fce factory too
6ab4a15 fixed warnings and missed sglang reference
16fa9d5 handler bits
ccc8f58 hotpath: reduce allocations
e2be222 initial SGLang work
4d3d3e4 initial configuration based on what's available
12a7d14 initial lemonade bits
1d65097 note about format
1b9ffd6 openai
c091490 perf: avoid resolvereference call if endpoint URL has no path
985d8eb perf: avoid GC pressure and preallocate
fbaece8 perf: reduce string allocations
dcb9050 race fix: method instead of module level
e012a30 reduce hashing and allocations
21de3da refactor and slightly different way to infer capabilities
3b19336 refactor to use benchmark
77a4b8c refeactor test
53e83a6 rune fix
3fcf132 slightly more complex fix to improve allocations in unified memory registry
319f442 update docs and make supported backends a table.
35b6cab update readme
837dc42 use map rather than MapOf (deprecated)
f9e8a69 wire up handler too and initial profile

Contributors

thushan

Assets 11

23 Sep 12:04

github-actions

v0.0.18

d2bc4af

olla-v0.0.18

This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.

What's Changed

chore: Consolidate Converters by @thushan in #58
September 2025 updates by @thushan in #68
Bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #60
refactor: Proxy Configurations by @thushan in #59
Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #63
Bump actions/setup-go from 5 to 6 by @dependabot[bot] in #62
Bump actions/configure-pages from 4 to 5 by @dependabot[bot] in #55
Bump actions/checkout from 4 to 5 by @dependabot[bot] in #54

Changelog

b5be024 Bump actions/checkout from 4 to 5
8141cb2 Bump actions/configure-pages from 4 to 5
720cc7c Bump actions/setup-go from 5 to 6
29afbd9 Bump actions/setup-python from 5 to 6
11c06d3 Bump actions/upload-pages-artifact from 3 to 4
10efb5a September 2025 updates
9c03ec9 cache time
c4bbb98 fix remaining convertors
dbf6dee initial consolidation of Proxy Configuration
0d27e9b introduce a base converter for conversion to avoid duplication
d7bec85 update the olla service config and fallback too
6951956 update workflows.
a83c209 use the specific settings and fallback if unavailable

Full Changelog: v0.0.17...v0.0.18

Contributors

thushan and dependabot

Assets 11

23 Aug 00:46

github-actions

v0.0.17

33c2a04

olla-v0.0.17

This release brings support for litellm and also the ability to filter (generically) within the config - with include/exclude globs. For now, this allows you to exclude profiles you don't want loaded and exclude models from an endpoint.

Learn more about filters.

What's Changed

docs: comparisons by @thushan in #53
feat: backend/litellm by @thushan in #56
feat: filtering adapter by @thushan in #57

Full Changelog: v0.0.16...v0.0.17

Changelog

7b7c96e Comparison docs for Olla from becky & wilson
b9b7a5d doc updates
ee3b07f fix default ports for vllm and lmstudio
33c2a04 fix links
140833f implements checks for filter breakages
e18f829 initial bits of a global filter config
978eb03 initial litellm profile
edb7a66 model and profile filterinf, tests and refactor glob to be reusable a bit more
bdb66d7 readme refresher
846f37f update docs
be278a1 update docs

Contributors

thushan

Assets 11

15 Aug 05:16

github-actions

v0.0.16

1908f7b

olla-v0.0.16

This release has two big features.

Improved Recovery & Transparent Healing

Health is monitored during every request and if a request routes to a just failed endpoint (before the healthcheck has run) it will transparently move to another healthy endpoint that contains the model. Transparent to the caller, but logs in the CLI and headers are available to know what happened.

We think this makes olla pretty awesome.

Intercepts Stats from Endpoints

We also capture the last packet from the stream/payload and pull out metrics for common things from endpoints (TPS etc) and track other metrics that are (currently) shown in the logs.

Later these will be used for a new robusta balancer.

What's Changed

feat: Improved health recovery by @thushan in #50
feat: better stats by @thushan in #52

Full Changelog: v0.0.15...v0.0.16

Changelog

2b9860c Add documentation for provider metrics feature
db2bab4 Adds VHS tapes.
76bfdd3 Constant'ine.
909dfb9 Fix compilation errors and shadow variable warning after merge
428d968 alloc changes
95d345d avoid blocking healthchecks during recovery, tricky.
3ea69ae cleanup factory and add test basic coverage
c5d2922 cleanup & refactor
a414331 cleanup constant use and new retry constnats
29a03c8 coderabbit feedback, routing strategy fixes
e49731f coderabbit feedback about routing
6f17a2c configuration updates
c1bec59 constants and reetry logic
2beb7f8 doc refresh
df99db7 doc updates
6b11bbc doc updates for trailer
14a8653 doc: max-retries still in config overview
3e6e272 docs updated
7e122f4 documentation for fallback types and routing to fallback_behaviour by default
e22cee0 documentation updates
988b753 doh we miss target url
ab05356 fix gitignore not to ignore olla but rather olla in the root
68234fb fix n+1 logic issue
9e40a30 impelemts routing similar to scout
14963c3 initial request metrics
4308adb lab test fixes for profiles
982cdc6 lab: float issues
e3558a6 make the jsonpath a bit more robust from lab tests
696ceb8 new routing strategy for registry
832ced6 profiling revealed some performance issues with custom written parser, adopting gjson and expr
0728c23 rabbit feedback around discovery issues, but refactored at the same time
ec0a15b reduce allocations and cleanup constants
55ccb0f refactor a bit and move to core/metrics
9cb6e06 remove from intro
3f0e306 remove integration test no longer used post impl.
417a76d removed debug in hotpath and try to compile expressions at compiletime
2dde757 reorg docs and add more detail
369958c retry logic for post endpoint health changes
feb17cb separate contexts to avoid failure issues
99acb83 test updates
9a8b53a test scripts
7b6ed53 tweaks
09cac43 update docs after changes in profiles

Contributors

thushan

Assets 11

12 Aug 07:57

github-actions

v0.0.15

2d8cb97

olla-v0.0.15

tldr;
This release adds proper cross platform docker images thanks to @ghostdevv and adds support for vLLM natively, proxy profiles so you can target streaming or buffering proxies, it finally also adds documentation (via mkdocs) and a few fixes and improvements

What's Changed

feat: proxy profiles by @thushan in #42
chore: constants by @thushan in #43
feat: backend/vllm by @thushan in #44
feat: add arm64 docker builds and better cross platform support by @ghostdevv in #46
feat: docs by @thushan in #48
feat: security & log consolidation by @thushan in #49

New Contributors

@ghostdevv made their first contribution in #46

Full Changelog: v0.0.14...v0.0.15

Changelog

39d77cf Adds Proxy Profile to configuration
d2be61a Consolidate logging for proxies
9b9fbd1 Revert "rabbit feedback of adding goos for context"
23f2634 Update readme.md
c95b460 add more comprehensive tests and constants for content types
1af819c add proxy profile as an env var
ca3faa5 adds detecting stream type for 'auto' and profiles properly
9c822a3 adds global constants properly for content / request bits.
aa36c90 adds global constants properly for content / request bits.
7f66f19 adds goos/version to status handlers
2ec111e avoid build validation for docs
af9941a avoid multiple instances of vllm responses
39ad794 avoid non go files
75c3282 change scripts and other files for standard behaviour
c7eb705 claude update
ff4f405 coderabbit feedback about having a bin, can't bin that feedback can we?
23ed305 detect streaming mode from scout
86768c3 feat: add arm64 docker builds
8f7acd4 findVLLMNativeName naively checks slashes, better to remove that
9bdbe14 fix TUI issues for long version numbers
b7a5bb4 fix: normalize line endings for docs and workflow files
05b2fe0 fleshing out things
e15ef9c forgot we can test arm64 with qemu
0e7d631 initial mkdocs-material integration
4f4fc05 initial streaming vs buffered tests
2711f82 initial test case infra
cac160a initial updates
e4ee050 initial vllm implementation
40cf1bf just show basic version info
1b81eb8 line ending normalisation
cb61406 lint & allow local dev profiles to use anyhost to avoid breakage
e533da2 missed type of Trailer header!
0a0805f msising configuration
2e42b9b mssing config
08ceebe rabbit feedback of adding goos for context
1670ec2 randomise port for test runs
661adba readme update
c025112 readme update
2d8cb97 readme update
f23e247 remove URLs from being visible in endpoints
c27e364 renaming buffered to standard
3ad6b49 renormalise
1087472 revert the fmt issue fix
21530ad run test results in test/results
d11fdd7 show proxy setup in the status
a988c45 slightly better way to handle status
e94aefe update CI with builds across all platforms
2ae68ee update converters to use constants
684185a update default configuration
bd579d7 update docs for OLLA_PROXY_PROFILE
2ac4e87 update readme
9fe1904 update readme for native vllm
8716a45 update readme for profile
6daef17 update remaining constants
05870bd update remaining constants
8393afd update test scripts for vllm support (mirrors existing).
1c261e9 update vllm profile
9c27c77 vLLM integration test that uses OLLA_TEST_SERVER_VLLM var for test

Contributors

thushan and ghostdevv

Assets 11

Uh oh!

Releases: thushan/olla

olla-v0.0.24

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.23

What's Changed

Contributors

Uh oh!

olla-v0.0.22

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.21

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.20

What's Changed

Contributors

Uh oh!

olla-v0.0.19

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.18

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.17

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.16

Improved Recovery & Transparent Healing

Intercepts Stats from Endpoints

What's Changed

Changelog

Contributors

Uh oh!

olla-v0.0.15

What's Changed

New Contributors

Changelog

Contributors

Uh oh!