Releases: thushan/olla
olla-v0.0.24
This is a bugfix release to fix some agentic workloads for translator mode, agent tooling and improved logging.
What's Changed
Changelog
- 3d00f3b coderabbit recommendation
- 37c416b default to Olla proxy engine
- a0d1941 fix duplicate increment
- 857c75d fix anthropic tooling bug
- 5827fb1 flush for sherpa interface
- 7a94b70 mssing outputconfig in anthropic requests
- 7649f80 readme tweak
- f98eeae show translation mode in logs
Full Changelog: v0.0.23...v0.0.24
olla-v0.0.23
This is a major release bringing in some exciting features:
- New Backends: Docker Model Runner and vLLM-MLX
- Support for Anthropic Passthrough on supported backends (vllm etc) so we don't translate in Olla
- Documentation Refinements based on feedback
- Sensible defaults so you can have a lean config file to overide for most users
- BUGFIX: Proxy Path problems (
/olla/proxy) resolution issues when certain mixes of backends were present - Additional tests for integration and pass through (python) for internal verification before shipping
- Security & dependency updates
What's Changed
- Bump actions/cache from 4 to 5 by @dependabot[bot] in #91
- chore: february 2026 dependency updates + CI fixes by @thushan in #101
- docs: February 2026 updates by @thushan in #102
- feat: endpoint optional by @thushan in #103
- Bump github.com/expr-lang/expr from 1.17.6 to 1.17.7 by @dependabot[bot] in #93
- feat: Anthropic Pass-through by @thushan in #105
- feat: backend/docker-model-runner by @thushan in #106
- fix: Proxy Path issue & sensible defaults by @thushan in #107
- fix: pass through failure by @thushan in #108
- feature: backend/vllm-mlx by @thushan in #109
- docs: vllm-mlx by @thushan in #110
- feat: python integration tests by @thushan in #112
- Bump github.com/expr-lang/expr from 1.17.7 to 1.17.8 by @dependabot[bot] in #111
Full Changelog: v0.0.22...v0.0.23
olla-v0.0.22
This release was largely for fixing model_url resolution but also contains some maintenance fixes.
What's Changed
- Bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #83
- fix: ensure model_url is used from endpoint config by @thushan in #88
- chore: december 2025 by @thushan in #89
- Bump actions/checkout from 5 to 6 by @dependabot[bot] in #85
- fix: Alternative method of resolving profile paths by @thushan in #90
Full Changelog: v0.0.21...v0.0.22
Changelog
- 6bf7c45 Bump actions/checkout from 5 to 6
- 83fbc91 Bump golang.org/x/sync from 0.17.0 to 0.18.0
- e2b4351 alternative way to join paths for OpenAI compatible profiles
- eebe4f1 copy paste issue
- b47df2f ensure that model_url is used from endpoint config and fallback is the profile.
- a6f8af2 feedback
- 1a3325a format
- 499a154 handle absolute URLs a bit better and expand test cases
- 72ddfcd lib update & validation
- 7428298 small refactor
- fd2733d update doc 4
olla-v0.0.21
This release is to help address path translation issues (see #80) with tools like Docker Model Runner and Olla's default way of handling paths and not preserving paths.
We added a new setting for endpoints to instruct Olla to preserve paths when proxying requests.
- url: "http://localhost:12434/engines/llama.cpp/"
name: "local-docker"
type: "openai-compatible"
priority: 100
preserve_path: true # this way, /v1/completions will forward properly to Docker Model Runner
model_url: "/models"
health_check_url: "/"
check_interval: 2s
check_timeout: 1sThere's also a bugfix for the missing OpenAI routing (for type: openai-compatible) with refreshed profiles for OpenAI.
What's Changed
Full Changelog: v0.0.20...v0.0.21
Changelog
olla-v0.0.20
This release brings back llamacpp integration and adds experimental Anthropic message support (disabled by default) at /olla/anthropic so you can point Claude Code and other tools easily.
What's Changed
- feat: Backend llamacpp by @thushan in #73
- feat: anthropic / message logger (development only) by @thushan in #77
- feat: Anthropic Message format Support by @thushan in #76
- Bump github.com/pterm/pterm from 0.12.81 to 0.12.82 by @dependabot[bot] in #75
- Bump golang.org/x/time from 0.13.0 to 0.14.0 by @dependabot[bot] in #72
- prepare: v0.0.20 by @thushan in #78
Full Changelog: v0.0.19...v0.0.20
olla-v0.0.19
This release has several performance fixes (noticeably uplift for ARM), critical fixes for all archs and adds support for sglang and LemonadeSDK.
Encourage all to upgrade to this release.
What's Changed
- feat: backend/sglang by @thushan in #69
- feat: backend/lemonade by @thushan in #70
- fixes: October 2025 performance improvements by @thushan in #71
Full Changelog: v0.0.18...v0.0.19
Changelog
- 554b2fa GetHealthyEndpointsForModel could leak targets that no longer exist.
- 4d3e12d adds parser
- dcf3c52 adds the parser and converter
- 267dcd2 atomic catalog store
- 716e57f avoid alloc on response times
- 203ce4a cleanup
- 9cb11c9 constants for linting, will add more later
- c7a7fc9 doc refresh
- 7aeb09f documentation
- 6748a50 documentation updates
- c688fce factory too
- 6ab4a15 fixed warnings and missed sglang reference
- 16fa9d5 handler bits
- ccc8f58 hotpath: reduce allocations
- e2be222 initial SGLang work
- 4d3d3e4 initial configuration based on what's available
- 12a7d14 initial lemonade bits
- 1d65097 note about format
- 1b9ffd6 openai
- c091490 perf: avoid resolvereference call if endpoint URL has no path
- 985d8eb perf: avoid GC pressure and preallocate
- fbaece8 perf: reduce string allocations
- dcb9050 race fix: method instead of module level
- e012a30 reduce hashing and allocations
- 21de3da refactor and slightly different way to infer capabilities
- 3b19336 refactor to use benchmark
- 77a4b8c refeactor test
- 53e83a6 rune fix
- 3fcf132 slightly more complex fix to improve allocations in unified memory registry
- 319f442 update docs and make supported backends a table.
- 35b6cab update readme
- 837dc42 use map rather than MapOf (deprecated)
- f9e8a69 wire up handler too and initial profile
olla-v0.0.18
This is mostly a maintenance release and includes consolidation of configuration of the Sherpa and Olla Proxies internally.
What's Changed
- chore: Consolidate Converters by @thushan in #58
- September 2025 updates by @thushan in #68
- Bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #60
- refactor: Proxy Configurations by @thushan in #59
- Bump actions/setup-python from 5 to 6 by @dependabot[bot] in #63
- Bump actions/setup-go from 5 to 6 by @dependabot[bot] in #62
- Bump actions/configure-pages from 4 to 5 by @dependabot[bot] in #55
- Bump actions/checkout from 4 to 5 by @dependabot[bot] in #54
Changelog
- b5be024 Bump actions/checkout from 4 to 5
- 8141cb2 Bump actions/configure-pages from 4 to 5
- 720cc7c Bump actions/setup-go from 5 to 6
- 29afbd9 Bump actions/setup-python from 5 to 6
- 11c06d3 Bump actions/upload-pages-artifact from 3 to 4
- 10efb5a September 2025 updates
- 9c03ec9 cache time
- c4bbb98 fix remaining convertors
- dbf6dee initial consolidation of Proxy Configuration
- 0d27e9b introduce a base converter for conversion to avoid duplication
- d7bec85 update the olla service config and fallback too
- 6951956 update workflows.
- a83c209 use the specific settings and fallback if unavailable
Full Changelog: v0.0.17...v0.0.18
olla-v0.0.17
This release brings support for litellm and also the ability to filter (generically) within the config - with include/exclude globs. For now, this allows you to exclude profiles you don't want loaded and exclude models from an endpoint.
Learn more about filters.
What's Changed
- docs: comparisons by @thushan in #53
- feat: backend/litellm by @thushan in #56
- feat: filtering adapter by @thushan in #57
Full Changelog: v0.0.16...v0.0.17
Changelog
- 7b7c96e Comparison docs for Olla from becky & wilson
- b9b7a5d doc updates
- ee3b07f fix default ports for vllm and lmstudio
- 33c2a04 fix links
- 140833f implements checks for filter breakages
- e18f829 initial bits of a global filter config
- 978eb03 initial litellm profile
- edb7a66 model and profile filterinf, tests and refactor glob to be reusable a bit more
- bdb66d7 readme refresher
- 846f37f update docs
- be278a1 update docs
olla-v0.0.16
This release has two big features.
Improved Recovery & Transparent Healing
Health is monitored during every request and if a request routes to a just failed endpoint (before the healthcheck has run) it will transparently move to another healthy endpoint that contains the model. Transparent to the caller, but logs in the CLI and headers are available to know what happened.
We think this makes olla pretty awesome.
Intercepts Stats from Endpoints
We also capture the last packet from the stream/payload and pull out metrics for common things from endpoints (TPS etc) and track other metrics that are (currently) shown in the logs.
Later these will be used for a new robusta balancer.
What's Changed
Full Changelog: v0.0.15...v0.0.16
Changelog
- 2b9860c Add documentation for provider metrics feature
- db2bab4 Adds VHS tapes.
- 76bfdd3 Constant'ine.
- 909dfb9 Fix compilation errors and shadow variable warning after merge
- 428d968 alloc changes
- 95d345d avoid blocking healthchecks during recovery, tricky.
- 3ea69ae cleanup factory and add test basic coverage
- c5d2922 cleanup & refactor
- a414331 cleanup constant use and new retry constnats
- 29a03c8 coderabbit feedback, routing strategy fixes
- e49731f coderabbit feedback about routing
- 6f17a2c configuration updates
- c1bec59 constants and reetry logic
- 2beb7f8 doc refresh
- df99db7 doc updates
- 6b11bbc doc updates for trailer
- 14a8653 doc: max-retries still in config overview
- 3e6e272 docs updated
- 7e122f4 documentation for fallback types and routing to fallback_behaviour by default
- e22cee0 documentation updates
- 988b753 doh we miss target url
- ab05356 fix gitignore not to ignore olla but rather olla in the root
- 68234fb fix n+1 logic issue
- 9e40a30 impelemts routing similar to scout
- 14963c3 initial request metrics
- 4308adb lab test fixes for profiles
- 982cdc6 lab: float issues
- e3558a6 make the jsonpath a bit more robust from lab tests
- 696ceb8 new routing strategy for registry
- 832ced6 profiling revealed some performance issues with custom written parser, adopting gjson and expr
- 0728c23 rabbit feedback around discovery issues, but refactored at the same time
- ec0a15b reduce allocations and cleanup constants
- 55ccb0f refactor a bit and move to core/metrics
- 9cb6e06 remove from intro
- 3f0e306 remove integration test no longer used post impl.
- 417a76d removed debug in hotpath and try to compile expressions at compiletime
- 2dde757 reorg docs and add more detail
- 369958c retry logic for post endpoint health changes
- feb17cb separate contexts to avoid failure issues
- 99acb83 test updates
- 9a8b53a test scripts
- 7b6ed53 tweaks
- 09cac43 update docs after changes in profiles
olla-v0.0.15
tldr;
This release adds proper cross platform docker images thanks to @ghostdevv and adds support for vLLM natively, proxy profiles so you can target streaming or buffering proxies, it finally also adds documentation (via mkdocs) and a few fixes and improvements
What's Changed
- feat: proxy profiles by @thushan in #42
- chore: constants by @thushan in #43
- feat: backend/vllm by @thushan in #44
- feat: add arm64 docker builds and better cross platform support by @ghostdevv in #46
- feat: docs by @thushan in #48
- feat: security & log consolidation by @thushan in #49
New Contributors
- @ghostdevv made their first contribution in #46
Full Changelog: v0.0.14...v0.0.15
Changelog
- 39d77cf Adds Proxy Profile to configuration
- d2be61a Consolidate logging for proxies
- 9b9fbd1 Revert "rabbit feedback of adding goos for context"
- 23f2634 Update readme.md
- c95b460 add more comprehensive tests and constants for content types
- 1af819c add proxy profile as an env var
- ca3faa5 adds detecting stream type for 'auto' and profiles properly
- 9c822a3 adds global constants properly for content / request bits.
- aa36c90 adds global constants properly for content / request bits.
- 7f66f19 adds goos/version to status handlers
- 2ec111e avoid build validation for docs
- af9941a avoid multiple instances of vllm responses
- 39ad794 avoid non go files
- 75c3282 change scripts and other files for standard behaviour
- c7eb705 claude update
- ff4f405 coderabbit feedback about having a bin, can't bin that feedback can we?
- 23ed305 detect streaming mode from scout
- 86768c3 feat: add arm64 docker builds
- 8f7acd4 findVLLMNativeName naively checks slashes, better to remove that
- 9bdbe14 fix TUI issues for long version numbers
- b7a5bb4 fix: normalize line endings for docs and workflow files
- 05b2fe0 fleshing out things
- e15ef9c forgot we can test arm64 with qemu
- 0e7d631 initial mkdocs-material integration
- 4f4fc05 initial streaming vs buffered tests
- 2711f82 initial test case infra
- cac160a initial updates
- e4ee050 initial vllm implementation
- 40cf1bf just show basic version info
- 1b81eb8 line ending normalisation
- cb61406 lint & allow local dev profiles to use anyhost to avoid breakage
- e533da2 missed type of Trailer header!
- 0a0805f msising configuration
- 2e42b9b mssing config
- 08ceebe rabbit feedback of adding goos for context
- 1670ec2 randomise port for test runs
- 661adba readme update
- c025112 readme update
- 2d8cb97 readme update
- f23e247 remove URLs from being visible in endpoints
- c27e364 renaming buffered to standard
- 3ad6b49 renormalise
- 1087472 revert the fmt issue fix
- 21530ad run test results in test/results
- d11fdd7 show proxy setup in the status
- a988c45 slightly better way to handle status
- e94aefe update CI with builds across all platforms
- 2ae68ee update converters to use constants
- 684185a update default configuration
- bd579d7 update docs for OLLA_PROXY_PROFILE
- 2ac4e87 update readme
- 9fe1904 update readme for native vllm
- 8716a45 update readme for profile
- 6daef17 update remaining constants
- 05870bd update remaining constants
- 8393afd update test scripts for vllm support (mirrors existing).
- 1c261e9 update vllm profile
- 9c27c77 vLLM integration test that uses OLLA_TEST_SERVER_VLLM var for test