Skip to content

Releases: kaito-project/kaito

v0.9.3

19 Mar 09:34
15229f7

Choose a tag to compare

v0.9.3 - 2026-03-19

Changelog

Bug Fixes 🐞

Maintenance πŸ”§

  • 4ce9f87 chore: bump google.golang.org/grpc from 1.78.0 to 1.79.3 (#1856)

v0.9.2

16 Mar 11:52
f32b085

Choose a tag to compare

v0.9.2 - 2026-03-16

Changelog

Maintenance πŸ”§

v0.9.1

10 Mar 22:21
df31226

Choose a tag to compare

v0.9.1 - 2026-03-10

Changelog

Bug Fixes 🐞

  • 54ebf1e fix: prevent webhook panic on auto-generated vLLM presets with empty GPU params (#1824) (#1834)

Continuous Integration πŸ’œ

  • 4c52d47 ci: pin trivy binary version to v0.69.2 in trivy workflow (#1817)

Maintenance πŸ”§

v0.9.0

27 Feb 12:07
18a0177

Choose a tag to compare

v0.9.0 - 2026-02-27

vLLM Runtime: Run Any vLLM-Compatible Model
KAITO now supports running arbitrary vLLM-compatible models β€” all you need is a HuggingFace repo ID. GPU memory, node count, storage, and agentic configuration (tool-call parsers, reasoning parsers) are automatically determined β€” no manual tuning required. The bundled vLLM version is bumped to v0.14.1.

Transformers Runtime: OpenAI-Compatible API
The Transformers-based serving engine now exposes an OpenAI-compatible API, while continuing to support the full breadth of HuggingFace models.

New WorkspaceStatus.State Field
A new state field has been added to WorkspaceStatus, giving users a clear, at-a-glance view of the current lifecycle state of a Workspace resource (e.g., provisioning, ready, failed).

Azure Linux Node Support
KAITO now supports Azure Linux node pools, expanding the range of AKS configurations that can be used for GPU workloads.

Retrieve API for RAG Service
A new /retrieve API has been added to the RAG engine, allowing callers to fetch retrieved document chunks directly β€” enabling more flexible, agentic RAG pipelines without a full generate step.

Changelog

Features 🌈

  • dc99157 feat: improve logs of workspace controller (#1802)
  • 0896cbb feat: update gpu-provisioner version to v0.4.1 for kaito
  • 9b1bdc2 feat: support azure linux node (#1784)
  • 353cc69 feat: generate supported model arch list supported by vLLM v0.14.1 (#1791)
  • d8c2bf6 feat: add e2e pipeline for azure linux scenario (#1792)
  • 43f3959 feat: publishing main image to GHCR (#1776)
  • cd5a67c feat: integrate transformers OpenAI-compatible serve engine (#1384) (#1765)
  • 714207c feat: update state of workspace status (#1758)
  • bf977bc feat: support tool-call-parser and reasoning-parser for more fine tuned models (#1766)
  • 5221259 feat: add retrieve API for RAG service (#1732)
  • 9fa77e1 feat: add status field into WorkspaceStatus for presenting Workspace current state. (#1745)
  • 29a58f4 feat: support generic huggingface vLLM inference model (#1727)
  • 18d7e85 feat: implement preset generator in golang (#1726)
  • e42bb04 feat: simplify vLLM inference model support flow (#1713)
  • 319419b feat: Make NVIDIA device plugin deployment optional via feature gate (#1707)

Bug Fixes 🐞

  • 9150e1a fix: remove ec2nodeclasss crd (#1803)
  • 88061de fix: ghcr service image name
  • 22fc27b fix: [transformers] log adapter name (#1790)
  • 6928398 fix: add required python packages to support generic huggingface models (#1781)
  • 35efc39 fix: remove expired VM sizes and fixed breaking references in tests (#1780)
  • a007b81 fix: Default InferenceSet replicas to 1 when unspecified (#1774)
  • 8a5fbeb fix: enlarge system storage cost overhead (#1772)
  • 67524e5 fix: trivy unknown severitis for test/kaito-base image (#1747)
  • 15ce2ce fix: use intermediate image tags to avoid race condition (#1746)
  • 6d70627 fix: install missing helm tool (#1743)
  • 52e9785 fix: conflict python deps in flaky unit test (#1741)
  • aaed80a fix: Disable flux2 NetworkPolicies blocking webhook communication (#1733)
  • e8430af fix: pin huggingface-hub dep version in UT (#1734)
  • 9f12aa3 fix: existing Workspace cannot be deleted or modified during version upgrade due to instanceType webhook validation (#1719)
  • b315bc7 fix: change the default GFD chart values (#1712)

Code Refactoring πŸ’Ž

  • 807626c refactor: rename GPUConfig.GPUMemGiB β†’ GPUMem, change type to resource.Quantity (#1779)

Documentation πŸ“˜

  • ae2c5cb docs: refine keda-autoscaler-inference doc (#1759)
  • b0980f7 chore(docs): update gwaie doc (#1757)
  • ed0fdb6 fix(docs): gwaie doc example manifests (#1755)
  • 2a4ac56 docs: provide docs for supporting generic huggingface vLLM inference model (#1756)
  • 5d8baa3 docs: fix preset_generator.py doc links due to rename (#1709)
  • 97d9c56 docs: post release doc update for v0.8.0 (#1705)
  • 3efea61 docs: add versioned documentation for v0.8.x (#1703)

Maintenance πŸ”§

  • 130351b chore: bump sentencepiece from 0.2.0 to 0.2.1 in /presets/workspace/dependencies (#1738)
  • dac0af6 chore: clean deadcode, outdated e2e tests (#1799)
  • 7880217 chore: bump vllm to 0.14.1 (#1785)
  • 6a3c0cb chore: add more tool-call-parser and reasoning-parser for a few models (#1793)
  • d5a477d chore: Bump GAIE version to v1.3.1 (#1788)
  • c5d0bbd chore: flatten oci artifact index and use temporary helm chart repository (#1753)
  • c9e4d51 chore: bump actions/setup-node from 4.3.0 to 6.2.0 (#1771)
  • 5131041 chore: bump webpack from 5.99.9 to 5.105.1 in /website (#1767)
  • 104cb50 chore: bump lodash from 4.17.21 to 4.17.23 in /website (#1737)
  • 0794168 chore: bump qs and express in /website (#1716)
  • 1fa77a7 chore: bump actions/checkout from 4 to 5 (#1644)
  • 2e5fe12 chore: force annotation and label values to be strings (#1748)
  • 27bc149 chore: upgrade ragengine crd to v1beta1 (#1701)
  • 57bdf3c chore: add more parameters to helm chart (#1724)
  • 1e351a0 chore: add estimator logging (#1715)
  • b2b8805 chore: publish helm charts as oci artifacts (#1717)

Testing πŸ’š

v0.8.1

24 Jan 01:17
d56734a

Choose a tag to compare

v0.8.1 - 2026-01-24

Changelog

Features 🌈

  • ea7c46d feat: Make NVIDIA device plugin deployment optional via feature gate (#1707) [release-0.8] (#1739)

Bug Fixes 🐞

  • e50f715 fix: conflict python deps in flaky unit test (#1741)
  • 98d8eb7 fix: existing Workspace cannot be deleted in byo node (#1719) [release-0.8] (#1736)
  • a543a90 fix: controller crash when Karpenter CRDs absent with isableNodeAutoProvisioning enabled (#1725) [release-0.8] (#1729)

Maintenance πŸ”§

Testing πŸ’š

  • 1725b51 test: add upgrade compatibility test for BYO mode (#1730) [release-0.8] (#1740)

v0.8.0

20 Dec 06:36
v0.8.0
4fc5af6

Choose a tag to compare

v0.8.0 - 2025-12-20

This release introduces a breaking change such that the inference workload is unified to StatefulSet. The Deployment resources created by existing workspaces will be removed by the controller and new StatefulSet resources will be created instead. No manual operation is required for this migration, and it is expected that the inference server hits a short period of downtime due to the Pod recreation.

Changelog

Breaking Changes πŸ’₯

  • 3ab3f3d feat: [BREAKING] use statefulset for all workspace (#1523)

Features 🌈

  • b966484 feat: update gpu-provisioner version to v0.3.8 for kaito (#1698)
  • 91819b9 feat: preset-generator support generic model format and attn arch (#1690)

Bug Fixes 🐞

Code Refactoring πŸ’Ž

  • 47fcd2e refactor: make sku-calculation a generic preset generator (#1689)

Documentation πŸ“˜

  • 318bf01 docs: fix namespace doc issue in keda-kaito-scaler (#1699)
  • 87c9c32 docs: use kaito-workspace in keda install (#1694)
  • eefd2b8 docs: add keda-autoscaler-inference scaling example in doc (#1682)
  • bbe61d7 docs: refine naming in docs and examples (#1681)
  • c78d68b docs: add keda-autoscaler-inference doc (#1679)

Maintenance πŸ”§

  • 67deec5 chore: bump golang to 1.24.11 (#1695)
  • 89aba34 chore: use pv cleaner from localcsi manager (#1687)
  • 7911b00 chore: fix huggingface_hub version in preset_generator (#1693)
  • 0fabc5c chore: bump ray to 0.25.1 (#1684)
  • 3d33b89 chore: bump js-yaml from 3.14.1 to 3.14.2 in /website (#1647)
  • 601ad7b chore: bump mdast-util-to-hast from 13.2.0 to 13.2.1 in /website (#1657)
  • e1efaa8 chore: e2e tests for pv support in RAG engine service (#1671)

Testing πŸ’š

v0.8.0-rc.0

08 Dec 11:07
v0.8.0-rc.0
7a718d0

Choose a tag to compare

v0.8.0-rc.0 Pre-release
Pre-release

v0.8.0-rc.0 - 2025-12-08

Changelog

Breaking Changes πŸ’₯

  • 57feef8 chore: [BREAKING] deprecate phi-2 model (#1667)
  • 78a76de feat: [BREAKING] remove /query api call and adding FastAPI info and Tags (#1621)

Features 🌈

  • 475f94e feat: support minor release version format (x.y.z-rc.w) (#1675)
  • f1cba23 feat: add mistral3 series models (#1668)
  • 295068f feat: add PV support to RAG service (#1660)
  • 5a8b580 feat: leverage AIKit for preset image packing (#1649)
  • a97f28d feat: add support for generic BYO nodes using NVIDIA GPU feature discovery (#1536)
  • 0a6ee77 feat: use skopeo mcr images (#1630)
  • 9b1f6bd feat: RAG benchmarking based on documents (#1615)
  • 38f6846 feat: add version info to ua and cmd (#1633)
  • 7933a2c feat: add user-agent header for RAG oai client (#1622)
  • f73f9b7 feat: add webhook validation for BYO nodes using GPU feature discovery (#1587)
  • a68bb82 feat: Provide token usage in rag service (#1605)
  • c963272 feat: update gpu-provisioner version to v0.3.7 for kaito (#1604)
  • 251d9a2 Revert "feat: support arm64 container images" (#1603)
  • 198408a feat: support arm64 container images (#1585)
  • 22afc33 feat: add a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1522)
  • ff7dd8d feat: add NVIDIA GPU feature discovery Helm chart (#1586)
  • 45f85fd feat: add gemma-3 4B and 27B models (#1572)
  • 42331a4 feat: adding total_items to RAG list documents response (#1578)

Bug Fixes 🐞

  • fe21280 fix: correct csi-local-node ds label (#1672)
  • cef240d fix: move GatewayAPIInferenceExtension into InferenceSet Controller (#1656)
  • 670d30b fix: add enableInferenceSetController in helm chart config (#1651)
  • 6437e6c fix: add new label to inference pods generated by InfefenceSet (#1645)
  • 45d68fd fix: add missing inferenceset CRD in charts (#1643)
  • cd30f25 fix: add findutils as runtime dependency of skopeo image (#1632)
  • 3eeb372 fix: bump pip to 25.3 in kaito-base image (#1629)
  • fe73c4b fix: add missing steps to skopeo workflow (#1614)
  • e8d798b fix: use crypto/rand package to generate random string (#1600)
  • 173bd5a fix: switch e2e test usage of Standard_NC6s_v3 to Standard_NV36ads_A10_v5 as NC6 no longer has quota (#1581)
  • bfd6a53 fix: handle no context found with passthrough to LLM (#1542)
  • 712b2ed fix: ResourceReady condition is never set to true for BYO (#1547)

Code Refactoring πŸ’Ž

  • c46be63 refactor: move the Inferenceset controller to a top-level pkg (#1670)

Continuous Integration πŸ’œ

  • 3791175 ci: reuse pip package cache when building image (#1516)

Documentation πŸ“˜

  • dced196 docs: fix enableInferenceSetController install method (#1669)
  • 9fb4087 docs: fix gateway-api-inference-extension doc (#1662)
  • 19c1476 docs: update preset list (#1597)
  • 0150971 docs: Update the custom model deployment guide (#1592)
  • 37d928c docs: update docs for using generic BYO nodes (#1588)
  • 90c1031 docs: update gateway-api-infernece-extension setup (#1528)
  • c41057a docs: Adding proposal for AutoIndexer CRD (#1538)
  • 1a0ae1c docs: Add proposal for using NVIDIA GPU feature discovery to support generic cloud provider nodes (#1548)
  • 37a9622 docs: fix typo in model-as-oci-artifacts.md (#1557)
  • 9bb19c5 docs: add proposal for gemma 3 models (#1540)
  • 6f14e25 docs: Update rag.md (#1534)
  • 0d75ff6 docs: Introduce a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1503)
  • f41784c docs: add versioned documentation for v0.7.x (#1521)

Maintenance πŸ”§

  • 819c093 chore: bump node-forge from 1.3.1 to 1.3.2 in /website (#1654)
  • cfb8e7f chore: bump vllm to 0.12.0 (#1663)
  • 68b5859 chore: migrate unit-test to self-hosted runner (#1650)
  • 93a6686 chore: use built-in GenerateName to generate random workspace name by InferenceSet (#1637)
  • 04229d0 chore: bump actions/github-script from 7 to 8 (#1639)
  • e5df64e chore: make local-csi-driver a helm dependency (#1483)
  • ca164c1 chore: bump docker/login-action from 3.5.0 to 3.6.0 (#1618)
  • ae8b6bf chore: add workflow for building and pushing skopeo image (#1613)
  • 1fe9a8f chore: bump @docusaurus/core from 3.9.1 to 3.9.2 in /website (#1601)
  • 51182cd chore: bump peter-evans/repository-dispatch from 3 to 4 (#1606)
  • b1795b6 chore: bump sigs.k8s.io/controller-runtime from 0.21.0 to 0.22.2 and k8s.io/* to 0.34.1 (#1571)
  • ec162d8 chore: disable InferenceSetController by default (#1599)
  • f6b5140 chore: bump gateway-api-inference-extension to v1.0.1 (#1566)
  • 2b8e565 Revert "chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs" (#1584)
  • f9fe6d3 chore: bump step-security/harden-runner from 2.12.0 to 2.13.1 (#1574)
  • c70caf1 chore: rename GB to GiB in GPUConfig (#1565)
  • a7b09ee chore: bump azurerm provider in terraofrm and update example (#1564)
  • b27f41c chore: bump azure/CLI from 2.1.0 to 2.2.0 (#1552)
  • a458d87 chore: bump react from 19.1.1 to 19.2.0 in /website (#1541)
  • 7560f07 chore: bump @docusaurus/module-type-aliases from 3.9.0 to 3.9.1 in /website (#1535)
  • 16e3032 chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs (#1451)
  • ab1f640 chore: bump python from 3.12-slim to 3.13-slim in /docker/ragengine/service (#1456)
  • 046b678 chore: bump actions/cache from 4.2.2 to 4.3.0 (#1530)
  • 9e7d731 chore: bump @docusaurus/core from 3.8.1 to 3.9.1 in /website (#1527)
  • 94a1fe4 chore: bump @docusaurus/types from 3.8.1 to 3.9.1 in /website (#1526)
  • 9d38499 chore: bump @docusaurus/module-type-aliases from 3.8.1 to 3.9.0 in /website (#1525)

Testing πŸ’š

v0.7.2

01 Nov 06:29
v0.7.2
65ecc95

Choose a tag to compare

v0.7.2 - 2025-11-01

Changelog

Features 🌈

Bug Fixes 🐞

  • 763b413 fix: bump pip to 25.3 in kaito-base and ragservice images [release-0.7] (#1631)

Maintenance πŸ”§

  • c048027 chore: make local-csi-driver a helm dependency (#1483) [release-0.7] (#1634)

v0.7.1

09 Oct 20:53
v0.7.1
d51873a

Choose a tag to compare

v0.7.1 - 2025-10-09

Changelog

Bug Fixes 🐞

  • 87db0e6 fix: handle no context found with passthrough to LLM (#1542) [release-0.7] (#1550)
  • c9b1a0f fix: ResourceReady condition is never set to true for BYO [release-0.7] (#1549)

v0.7.0

24 Sep 03:25
v0.7.0
cbc20c1

Choose a tag to compare

v0.7.0 - 2025-09-24

Changelog

Breaking Changes πŸ’₯

  • dc15b16 feat: [BREAKING] adding context size window to rag spec (#1392)

Features 🌈

  • 3285259 feat: update gpu-provisioner version to v0.3.6 for kaito (#1512)
  • dd80318 feat: refactor workspace controller to support NodeEstimator and Workspace.Status.TargetNodeCount (#1477)
  • d242413 feat: Max token calculator (#1507)
  • fedf505 feat: node calculator (#1435)
  • 509f7ac feat: add support for GPT-OSS 20B & 120B models (#1442)
  • 434e8ee feat: configure workspace replicas, perReplicaNodeCount and TargetNodeCount (#1473)
  • 852b863 feat: add node manager for ensuring device plugin and accelerator label (#1475)
  • e8c84e7 feat: separate EnsureNodeClaims into ScaleUpNodeClaims and ScaleDownNodeClaims (#1461)
  • 2e8e8d9 feat: only run vector search on latest user message (#1427)
  • b12801d feat: adding breaking change handling into goreleaser (#1450)
  • 8cbf9be feat: add nodeclaim manager for creating/deleting nodeclaims of workspace (#1417)
  • f7d2142 feat: add node estimator for calculating PerReplicaNodeCount (#1414)
  • 4f91b0c feat: Enhanced Token Management and Context Selection for RAGEngine (#1404)
  • f46c916 feat: update workspace CRD for supporting scale subresource API (#1349)
  • 24cea27 feat: Add Gateway API Inference Extension support (#1252)
  • d8f24e4 feat: add a feature gate to disable node auto-provisioning and validate the preferredNodes (#1337)
  • 3a92e98 feat: offload kv cache to cpu RAM on vllm v1 (#1326)
  • 154e3ff feat: add Flux Helm controller as optional dependency in Helm chart (#1363)

Bug Fixes 🐞

  • ceeda03 fix: incompatible liveness probe check (#1505)
  • 0b1f9b6 fix: fixed failing rag e2e test (#1502)
  • 1db155c fix: helm upgrade --install needs release name (#1510)
  • df55167 fix: ensure a non-empty volumnMount is appended in puller containers (#1480)
  • c55c457 fix: add presets/ragengine to rag e2e test workflow (#1462)
  • 2355ccb fix: add nodeSelector on Linux to avoid crash on Windows node (#1446)
  • 9d57e99 fix: add nodeaffinity for tuning job (#1429)
  • fe7648f Revert "fix: add missing pynvml package dependency" (#1426)
  • 909dcd6 fix: add missing pynvml package dependency (#1424)
  • 20bd85a fix: fix chat template for phi4mini (#1422)
  • 3a91190 fix: cannot read kv-cache-cpu-memory-utilization from ConfigMap (#1415)

Code Refactoring πŸ’Ž

  • 3ffe0d0 refactor: move image building step out of PRs (#1423)

Continuous Integration πŸ’œ

  • 8f63cb2 ci: add workflow_dispatch options to publish pipelines (#1490)
  • 54dab79 ci: update .github/dependabot.yaml with the right file paths (#1433)
  • 9e20cce ci: add pipeline to generate versioned docs for each minor version (#1333)
  • 3182f25 ci: aikit test (#1395)

Documentation πŸ“˜

  • 9cbe558 docs: Fix KAITO acronym alignment (#1501)
  • c174bdb docs: install helm charts via helm repository instead of using tarballs (#1491)
  • 2cb4f12 docs: add implementation strategy for BYO nodes proposal (#1474)
  • 6a550e1 docs: update preset onboarding doc with examples (#1471)
  • c670417 docs: add BYO nodes redesign proposal (#1412)
  • 9b18fe8 docs: add links and note to GPU benchmarks doc (#1466)
  • b0fca84 docs: update istio environment variable in installation step (#1464)
  • 1434241 docs: fix bits of spellings and revise code of conduct for clarity (#1460)
  • 6a97477 docs: update slack community links to cncf workspace (#1459)
  • 6db2ce6 docs: add docs for Gateway API Inference Extension (#1434)
  • eebfeff docs: update aikit docs to fix links (#1445)
  • da23805 docs: update benchmarks to use vLLM data for Phi 4 Mini and Llama 8B (#1438)
  • d750208 docs: rename the file to resolve the "Page Not Found" issue (#1430)
  • f7c269b docs: add page to website for performance benchmarks with Phi-4-mini (#1401)
  • 8c8585a docs: add hideable sidebar option to Docusaurus config (#1408)
  • b9d4c48 docs: fix typos, grammar, and clarity (#1396)
  • fe7d6be docs: corrected typo in README.md (#1391)
  • f45a377 docs: add versioned docs for v0.6.x (#1377)
  • 91b5f6e docs: fix EKS hyperlink (#1365)
  • 395bc4b docs: add post release doc update Makefile cmd (#1372)

Maintenance πŸ”§

  • 5d7777a chore: update base image for workspace and ragengine (#1519)
  • c47c418 chore: bump codecov/codecov-action from 5.5.0 to 5.5.1 (#1492)
  • 8b4e2fd chore: bump gateway-api-inference-extension to v1.0.0 (#1488)
  • 096cbb2 chore: remove preset testing pipeline (#1482)
  • 596ea56 chore: bump aquasecurity/trivy-action from 0.32.0 to 0.33.1 (#1476)
  • 4c200b2 chore: bump react and react-dom in /website (#1458)
  • 8518b21 chore: bump @mdx-js/react from 3.1.0 to 3.1.1 in /website (#1457)
  • 22aecf7 chore: bump goreleaser/goreleaser-action from 6.3.0 to 6.4.0 (#1452)
  • e2c2422 chore: bump kaito-base tag to 0.0.7 (#1449)
  • 59f3306 chore: bump local-csi-driver to v0.2.5 (#1440)
  • aaf2b6b chore: bump codecov/codecov-action from 5.4.3 to 5.5.0 (#1439)
  • 8cd8bc2 chore: bump docker/login-action from 3.4.0 to 3.5.0 (#1437)
  • 9b79ccc chore: bump vLLM to 0.10.1 (#1405)
  • 88d5e73 chore: cleanup verbose logs (#1420)
  • 9b1250e chore: bump mermaid from 11.9.0 to 11.10.0 in /website (#1410)
  • 5d85652 chore: bump local-csi-driver to v0.2.4 (#1406)
  • 870a797 chore: linter/formatter for python code (#1351)
  • 516d61e chore: polish local-csi-driver charts (#1386)
  • 026be28 chore: rename files to use underscore instead of dashes (#1389)
  • cfab8e9 chore: bump actions/checkout from 4.2.2 to 5.0.0 (#1387)
  • cf4bee1 chore: bump actions/cache from 4.2.3 to 4.2.4 (#1385)
  • 363038f chore: bump aquasecurity/trivy-action from 0.31.0 to 0.32.0 (#1383)