Releases: kaito-project/kaito
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.9.0 - 2026-02-27
vLLM Runtime: Run Any vLLM-Compatible Model
KAITO now supports running arbitrary vLLM-compatible models β all you need is a HuggingFace repo ID. GPU memory, node count, storage, and agentic configuration (tool-call parsers, reasoning parsers) are automatically determined β no manual tuning required. The bundled vLLM version is bumped to v0.14.1.
Transformers Runtime: OpenAI-Compatible API
The Transformers-based serving engine now exposes an OpenAI-compatible API, while continuing to support the full breadth of HuggingFace models.
New WorkspaceStatus.State Field
A new state field has been added to WorkspaceStatus, giving users a clear, at-a-glance view of the current lifecycle state of a Workspace resource (e.g., provisioning, ready, failed).
Azure Linux Node Support
KAITO now supports Azure Linux node pools, expanding the range of AKS configurations that can be used for GPU workloads.
Retrieve API for RAG Service
A new /retrieve API has been added to the RAG engine, allowing callers to fetch retrieved document chunks directly β enabling more flexible, agentic RAG pipelines without a full generate step.
Changelog
Features π
- dc99157 feat: improve logs of workspace controller (#1802)
- 0896cbb feat: update gpu-provisioner version to v0.4.1 for kaito
- 9b1bdc2 feat: support azure linux node (#1784)
- 353cc69 feat: generate supported model arch list supported by vLLM v0.14.1 (#1791)
- d8c2bf6 feat: add e2e pipeline for azure linux scenario (#1792)
- 43f3959 feat: publishing main image to GHCR (#1776)
- cd5a67c feat: integrate transformers OpenAI-compatible serve engine (#1384) (#1765)
- 714207c feat: update state of workspace status (#1758)
- bf977bc feat: support tool-call-parser and reasoning-parser for more fine tuned models (#1766)
- 5221259 feat: add retrieve API for RAG service (#1732)
- 9fa77e1 feat: add status field into WorkspaceStatus for presenting Workspace current state. (#1745)
- 29a58f4 feat: support generic huggingface vLLM inference model (#1727)
- 18d7e85 feat: implement preset generator in golang (#1726)
- e42bb04 feat: simplify vLLM inference model support flow (#1713)
- 319419b feat: Make NVIDIA device plugin deployment optional via feature gate (#1707)
Bug Fixes π
- 9150e1a fix: remove ec2nodeclasss crd (#1803)
- 88061de fix: ghcr service image name
- 22fc27b fix: [transformers] log adapter name (#1790)
- 6928398 fix: add required python packages to support generic huggingface models (#1781)
- 35efc39 fix: remove expired VM sizes and fixed breaking references in tests (#1780)
- a007b81 fix: Default InferenceSet replicas to 1 when unspecified (#1774)
- 8a5fbeb fix: enlarge system storage cost overhead (#1772)
- 67524e5 fix: trivy unknown severitis for test/kaito-base image (#1747)
- 15ce2ce fix: use intermediate image tags to avoid race condition (#1746)
- 6d70627 fix: install missing helm tool (#1743)
- 52e9785 fix: conflict python deps in flaky unit test (#1741)
- aaed80a fix: Disable flux2 NetworkPolicies blocking webhook communication (#1733)
- e8430af fix: pin huggingface-hub dep version in UT (#1734)
- 9f12aa3 fix: existing Workspace cannot be deleted or modified during version upgrade due to instanceType webhook validation (#1719)
- b315bc7 fix: change the default GFD chart values (#1712)
Code Refactoring π
Documentation π
- ae2c5cb docs: refine keda-autoscaler-inference doc (#1759)
- b0980f7 chore(docs): update gwaie doc (#1757)
- ed0fdb6 fix(docs): gwaie doc example manifests (#1755)
- 2a4ac56 docs: provide docs for supporting generic huggingface vLLM inference model (#1756)
- 5d8baa3 docs: fix preset_generator.py doc links due to rename (#1709)
- 97d9c56 docs: post release doc update for v0.8.0 (#1705)
- 3efea61 docs: add versioned documentation for v0.8.x (#1703)
Maintenance π§
- 130351b chore: bump sentencepiece from 0.2.0 to 0.2.1 in /presets/workspace/dependencies (#1738)
- dac0af6 chore: clean deadcode, outdated e2e tests (#1799)
- 7880217 chore: bump vllm to 0.14.1 (#1785)
- 6a3c0cb chore: add more tool-call-parser and reasoning-parser for a few models (#1793)
- d5a477d chore: Bump GAIE version to v1.3.1 (#1788)
- c5d0bbd chore: flatten oci artifact index and use temporary helm chart repository (#1753)
- c9e4d51 chore: bump actions/setup-node from 4.3.0 to 6.2.0 (#1771)
- 5131041 chore: bump webpack from 5.99.9 to 5.105.1 in /website (#1767)
- 104cb50 chore: bump lodash from 4.17.21 to 4.17.23 in /website (#1737)
- 0794168 chore: bump qs and express in /website (#1716)
- 1fa77a7 chore: bump actions/checkout from 4 to 5 (#1644)
- 2e5fe12 chore: force annotation and label values to be strings (#1748)
- 27bc149 chore: upgrade ragengine crd to v1beta1 (#1701)
- 57bdf3c chore: add more parameters to helm chart (#1724)
- 1e351a0 chore: add estimator logging (#1715)
- b2b8805 chore: publish helm charts as oci artifacts (#1717)
Testing π
v0.8.1
v0.8.1 - 2026-01-24
Changelog
Features π
- ea7c46d feat: Make NVIDIA device plugin deployment optional via feature gate (#1707) [release-0.8] (#1739)
Bug Fixes π
- e50f715 fix: conflict python deps in flaky unit test (#1741)
- 98d8eb7 fix: existing Workspace cannot be deleted in byo node (#1719) [release-0.8] (#1736)
- a543a90 fix: controller crash when Karpenter CRDs absent with isableNodeAutoProvisioning enabled (#1725) [release-0.8] (#1729)
Maintenance π§
- e9a05ae chore: add more parameters to helm chart (#1724) (#1728)
- e3c3640 chore: publish helm charts as oci artifacts (#1717) (#1718)
Testing π
v0.8.0
v0.8.0 - 2025-12-20
This release introduces a breaking change such that the inference workload is unified to StatefulSet. The Deployment resources created by existing workspaces will be removed by the controller and new StatefulSet resources will be created instead. No manual operation is required for this migration, and it is expected that the inference server hits a short period of downtime due to the Pod recreation.
Changelog
Breaking Changes π₯
Features π
- b966484 feat: update gpu-provisioner version to v0.3.8 for kaito (#1698)
- 91819b9 feat: preset-generator support generic model format and attn arch (#1690)
Bug Fixes π
- 1366f9a fix: set imagePullPolicy to Always (#1702)
- 8945b5b fix: workload type in ragengine e2e test (#1697)
- dffd5f3 fix: invalid indentation in artifacthub links (#1683)
- e5d77e5 fix: cancel latest release when it's perrelease (#1680)
- e813c46 fix: release tag validation rule (#1677)
Code Refactoring π
Documentation π
- 318bf01 docs: fix namespace doc issue in keda-kaito-scaler (#1699)
- 87c9c32 docs: use kaito-workspace in keda install (#1694)
- eefd2b8 docs: add keda-autoscaler-inference scaling example in doc (#1682)
- bbe61d7 docs: refine naming in docs and examples (#1681)
- c78d68b docs: add keda-autoscaler-inference doc (#1679)
Maintenance π§
- 67deec5 chore: bump golang to 1.24.11 (#1695)
- 89aba34 chore: use pv cleaner from localcsi manager (#1687)
- 7911b00 chore: fix huggingface_hub version in preset_generator (#1693)
- 0fabc5c chore: bump ray to 0.25.1 (#1684)
- 3d33b89 chore: bump js-yaml from 3.14.1 to 3.14.2 in /website (#1647)
- 601ad7b chore: bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /website (#1657)
- e1efaa8 chore: e2e tests for pv support in RAG engine service (#1671)
Testing π
v0.8.0-rc.0
v0.8.0-rc.0 - 2025-12-08
Changelog
Breaking Changes π₯
- 57feef8 chore: [BREAKING] deprecate phi-2 model (#1667)
- 78a76de feat: [BREAKING] remove /query api call and adding FastAPI info and Tags (#1621)
Features π
- 475f94e feat: support minor release version format (x.y.z-rc.w) (#1675)
- f1cba23 feat: add mistral3 series models (#1668)
- 295068f feat: add PV support to RAG service (#1660)
- 5a8b580 feat: leverage AIKit for preset image packing (#1649)
- a97f28d feat: add support for generic BYO nodes using NVIDIA GPU feature discovery (#1536)
- 0a6ee77 feat: use skopeo mcr images (#1630)
- 9b1f6bd feat: RAG benchmarking based on documents (#1615)
- 38f6846 feat: add version info to ua and cmd (#1633)
- 7933a2c feat: add user-agent header for RAG oai client (#1622)
- f73f9b7 feat: add webhook validation for BYO nodes using GPU feature discovery (#1587)
- a68bb82 feat: Provide token usage in rag service (#1605)
- c963272 feat: update gpu-provisioner version to v0.3.7 for kaito (#1604)
- 251d9a2 Revert "feat: support arm64 container images" (#1603)
- 198408a feat: support arm64 container images (#1585)
- 22afc33 feat: add a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1522)
- ff7dd8d feat: add NVIDIA GPU feature discovery Helm chart (#1586)
- 45f85fd feat: add gemma-3 4B and 27B models (#1572)
- 42331a4 feat: adding total_items to RAG list documents response (#1578)
Bug Fixes π
- fe21280 fix: correct csi-local-node ds label (#1672)
- cef240d fix: move GatewayAPIInferenceExtension into InferenceSet Controller (#1656)
- 670d30b fix: add enableInferenceSetController in helm chart config (#1651)
- 6437e6c fix: add new label to inference pods generated by InfefenceSet (#1645)
- 45d68fd fix: add missing inferenceset CRD in charts (#1643)
- cd30f25 fix: add findutils as runtime dependency of skopeo image (#1632)
- 3eeb372 fix: bump pip to 25.3 in kaito-base image (#1629)
- fe73c4b fix: add missing steps to skopeo workflow (#1614)
- e8d798b fix: use crypto/rand package to generate random string (#1600)
- 173bd5a fix: switch e2e test usage of Standard_NC6s_v3 to Standard_NV36ads_A10_v5 as NC6 no longer has quota (#1581)
- bfd6a53 fix: handle no context found with passthrough to LLM (#1542)
- 712b2ed fix: ResourceReady condition is never set to true for BYO (#1547)
Code Refactoring π
Continuous Integration π
Documentation π
- dced196 docs: fix enableInferenceSetController install method (#1669)
- 9fb4087 docs: fix gateway-api-inference-extension doc (#1662)
- 19c1476 docs: update preset list (#1597)
- 0150971 docs: Update the custom model deployment guide (#1592)
- 37d928c docs: update docs for using generic BYO nodes (#1588)
- 90c1031 docs: update gateway-api-infernece-extension setup (#1528)
- c41057a docs: Adding proposal for AutoIndexer CRD (#1538)
- 1a0ae1c docs: Add proposal for using NVIDIA GPU feature discovery to support generic cloud provider nodes (#1548)
- 37a9622 docs: fix typo in model-as-oci-artifacts.md (#1557)
- 9bb19c5 docs: add proposal for gemma 3 models (#1540)
- 6f14e25 docs: Update rag.md (#1534)
- 0d75ff6 docs: Introduce a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1503)
- f41784c docs: add versioned documentation for v0.7.x (#1521)
Maintenance π§
- 819c093 chore: bump node-forge from 1.3.1 to 1.3.2 in /website (#1654)
- cfb8e7f chore: bump vllm to 0.12.0 (#1663)
- 68b5859 chore: migrate unit-test to self-hosted runner (#1650)
- 93a6686 chore: use built-in GenerateName to generate random workspace name by InferenceSet (#1637)
- 04229d0 chore: bump actions/github-script from 7 to 8 (#1639)
- e5df64e chore: make local-csi-driver a helm dependency (#1483)
- ca164c1 chore: bump docker/login-action from 3.5.0 to 3.6.0 (#1618)
- ae8b6bf chore: add workflow for building and pushing skopeo image (#1613)
- 1fe9a8f chore: bump @docusaurus/core from 3.9.1 to 3.9.2 in /website (#1601)
- 51182cd chore: bump peter-evans/repository-dispatch from 3 to 4 (#1606)
- b1795b6 chore: bump sigs.k8s.io/controller-runtime from 0.21.0 to 0.22.2 and k8s.io/* to 0.34.1 (#1571)
- ec162d8 chore: disable InferenceSetController by default (#1599)
- f6b5140 chore: bump gateway-api-inference-extension to v1.0.1 (#1566)
- 2b8e565 Revert "chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs" (#1584)
- f9fe6d3 chore: bump step-security/harden-runner from 2.12.0 to 2.13.1 (#1574)
- c70caf1 chore: rename GB to GiB in GPUConfig (#1565)
- a7b09ee chore: bump azurerm provider in terraofrm and update example (#1564)
- b27f41c chore: bump azure/CLI from 2.1.0 to 2.2.0 (#1552)
- a458d87 chore: bump react from 19.1.1 to 19.2.0 in /website (#1541)
- 7560f07 chore: bump @docusaurus/module-type-aliases from 3.9.0 to 3.9.1 in /website (#1535)
- 16e3032 chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs (#1451)
- ab1f640 chore: bump python from 3.12-slim to 3.13-slim in /docker/ragengine/service (#1456)
- 046b678 chore: bump actions/cache from 4.2.2 to 4.3.0 (#1530)
- 9e7d731 chore: bump @docusaurus/core from 3.8.1 to 3.9.1 in /website (#1527)
- 94a1fe4 chore: bump @docusaurus/types from 3.8.1 to 3.9.1 in /website (#1526)
- 9d38499 chore: bump @docusaurus/module-type-aliases from 3.8.1 to 3.9.0 in /website (#1525)
Testing π
- 71c6747 test: refine inferenceset example and AIKit test (#1674)
- 79383f8 test: ignore more paths in e2e test (#1664)
- 2552a54 test: add ListWorkspaces unit test (#1655)
- 23ea347 test: add InferenceSet e2e test (#1642)
- 28f1872 test: add keda-kaito-scaler test in AIKit test suite (#1652)
- 83e1164 test: fix unstable ut failure (#1646)
- 1da9e34 test: add codespell github action (#1506)
v0.7.2
v0.7.2 - 2025-11-01
Changelog
Features π
Bug Fixes π
Maintenance π§
v0.7.1
v0.7.0
v0.7.0 - 2025-09-24
Changelog
Breaking Changes π₯
Features π
- 3285259 feat: update gpu-provisioner version to v0.3.6 for kaito (#1512)
- dd80318 feat: refactor workspace controller to support NodeEstimator and Workspace.Status.TargetNodeCount (#1477)
- d242413 feat: Max token calculator (#1507)
- fedf505 feat: node calculator (#1435)
- 509f7ac feat: add support for GPT-OSS 20B & 120B models (#1442)
- 434e8ee feat: configure workspace replicas, perReplicaNodeCount and TargetNodeCount (#1473)
- 852b863 feat: add node manager for ensuring device plugin and accelerator label (#1475)
- e8c84e7 feat: separate EnsureNodeClaims into ScaleUpNodeClaims and ScaleDownNodeClaims (#1461)
- 2e8e8d9 feat: only run vector search on latest user message (#1427)
- b12801d feat: adding breaking change handling into goreleaser (#1450)
- 8cbf9be feat: add nodeclaim manager for creating/deleting nodeclaims of workspace (#1417)
- f7d2142 feat: add node estimator for calculating PerReplicaNodeCount (#1414)
- 4f91b0c feat: Enhanced Token Management and Context Selection for RAGEngine (#1404)
- f46c916 feat: update workspace CRD for supporting scale subresource API (#1349)
- 24cea27 feat: Add Gateway API Inference Extension support (#1252)
- d8f24e4 feat: add a feature gate to disable node auto-provisioning and validate the preferredNodes (#1337)
- 3a92e98 feat: offload kv cache to cpu RAM on vllm v1 (#1326)
- 154e3ff feat: add Flux Helm controller as optional dependency in Helm chart (#1363)
Bug Fixes π
- ceeda03 fix: incompatible liveness probe check (#1505)
- 0b1f9b6 fix: fixed failing rag e2e test (#1502)
- 1db155c fix: helm upgrade --install needs release name (#1510)
- df55167 fix: ensure a non-empty volumnMount is appended in puller containers (#1480)
- c55c457 fix: add presets/ragengine to rag e2e test workflow (#1462)
- 2355ccb fix: add nodeSelector on Linux to avoid crash on Windows node (#1446)
- 9d57e99 fix: add nodeaffinity for tuning job (#1429)
- fe7648f Revert "fix: add missing pynvml package dependency" (#1426)
- 909dcd6 fix: add missing pynvml package dependency (#1424)
- 20bd85a fix: fix chat template for phi4mini (#1422)
- 3a91190 fix: cannot read kv-cache-cpu-memory-utilization from ConfigMap (#1415)
Code Refactoring π
Continuous Integration π
- 8f63cb2 ci: add workflow_dispatch options to publish pipelines (#1490)
- 54dab79 ci: update .github/dependabot.yaml with the right file paths (#1433)
- 9e20cce ci: add pipeline to generate versioned docs for each minor version (#1333)
- 3182f25 ci: aikit test (#1395)
Documentation π
- 9cbe558 docs: Fix KAITO acronym alignment (#1501)
- c174bdb docs: install helm charts via helm repository instead of using tarballs (#1491)
- 2cb4f12 docs: add implementation strategy for BYO nodes proposal (#1474)
- 6a550e1 docs: update preset onboarding doc with examples (#1471)
- c670417 docs: add BYO nodes redesign proposal (#1412)
- 9b18fe8 docs: add links and note to GPU benchmarks doc (#1466)
- b0fca84 docs: update istio environment variable in installation step (#1464)
- 1434241 docs: fix bits of spellings and revise code of conduct for clarity (#1460)
- 6a97477 docs: update slack community links to cncf workspace (#1459)
- 6db2ce6 docs: add docs for Gateway API Inference Extension (#1434)
- eebfeff docs: update aikit docs to fix links (#1445)
- da23805 docs: update benchmarks to use vLLM data for Phi 4 Mini and Llama 8B (#1438)
- d750208 docs: rename the file to resolve the "Page Not Found" issue (#1430)
- f7c269b docs: add page to website for performance benchmarks with Phi-4-mini (#1401)
- 8c8585a docs: add hideable sidebar option to Docusaurus config (#1408)
- b9d4c48 docs: fix typos, grammar, and clarity (#1396)
- fe7d6be docs: corrected typo in README.md (#1391)
- f45a377 docs: add versioned docs for v0.6.x (#1377)
- 91b5f6e docs: fix EKS hyperlink (#1365)
- 395bc4b docs: add post release doc update Makefile cmd (#1372)
Maintenance π§
- 5d7777a chore: update base image for workspace and ragengine (#1519)
- c47c418 chore: bump codecov/codecov-action from 5.5.0 to 5.5.1 (#1492)
- 8b4e2fd chore: bump gateway-api-inference-extension to v1.0.0 (#1488)
- 096cbb2 chore: remove preset testing pipeline (#1482)
- 596ea56 chore: bump aquasecurity/trivy-action from 0.32.0 to 0.33.1 (#1476)
- 4c200b2 chore: bump react and react-dom in /website (#1458)
- 8518b21 chore: bump @mdx-js/react from 3.1.0 to 3.1.1 in /website (#1457)
- 22aecf7 chore: bump goreleaser/goreleaser-action from 6.3.0 to 6.4.0 (#1452)
- e2c2422 chore: bump kaito-base tag to 0.0.7 (#1449)
- 59f3306 chore: bump local-csi-driver to v0.2.5 (#1440)
- aaf2b6b chore: bump codecov/codecov-action from 5.4.3 to 5.5.0 (#1439)
- 8cd8bc2 chore: bump docker/login-action from 3.4.0 to 3.5.0 (#1437)
- 9b79ccc chore: bump vLLM to 0.10.1 (#1405)
- 88d5e73 chore: cleanup verbose logs (#1420)
- 9b1250e chore: bump mermaid from 11.9.0 to 11.10.0 in /website (#1410)
- 5d85652 chore: bump local-csi-driver to v0.2.4 (#1406)
- 870a797 chore: linter/formatter for python code (#1351)
- 516d61e chore: polish local-csi-driver charts (#1386)
- 026be28 chore: rename files to use underscore instead of dashes (#1389)
- cfab8e9 chore: bump actions/checkout from 4.2.2 to 5.0.0 (#1387)
- cf4bee1 chore: bump actions/cache from 4.2.3 to 4.2.4 (#1385)
- 363038f chore: bump aquasecurity/trivy-action from 0.31.0 to 0.32.0 (#1383)