Skip to content

Releases: kubernetes-sigs/gateway-api-inference-extension

v1.2.0-rc.1

21 Nov 18:27
v1.2.0-rc.1
318cd7c

Choose a tag to compare

v1.2.0-rc.1 Pre-release
Pre-release

What's Changed

  • Add openai api link for request format by @learner0810 in #1757
  • Docs: Fix incorrect stream_options value in Observability example by @aman4433 in #1758
  • Docs: Bumps Quickstart to Use Kgateway v2.2.0-main by @danehans in #1761
  • Docs: Updates Latest/Main Quickstart by @danehans in #1747
  • Docs: Versioned Quickstart Install All CRDs by @danehans in #1762
  • chore: fixed meeting link by @nirrozenbaum in #1734
  • Add Produces and Consumes methods to Plugin by @rahulgurnani in #1754
  • Docs: Removes Agentgateway Docs by @danehans in #1771
  • Record EPP NormalizedTimePerOutputToken metric on streaming mode by @dharaneeshvrd in #1706
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 by @dependabot[bot] in #1776
  • chore(deps): bump github.com/prometheus/prometheus from 0.307.1 to 0.307.2 by @dependabot[bot] in #1774
  • fix tracing configuration in helm epp-deployment template by @sallyom in #1777
  • Fix for kustomization missing path for inferencepoolimport.yaml. by @bexxmodd in #1782
  • fix inferenceobjective api types link by @learner0810 in #1739
  • update release quickstart to use v1.1.0 by @nirrozenbaum in #1785
  • [metrics]: Allow EPP to register metrics from extension by @JeffLuoo in #1787
  • feat (reports): add infrastructure to run NGF conformance tests and i… by @sindhushiv in #1788
  • Add Install Gateway section in Getting Started Latest guide by @dharaneeshvrd in #1759
  • quickstart cleanup by @nirrozenbaum in #1805
  • fix(release): update quickstart guide version automatically by @AvineshTripathi in #1803
  • chore(deps): bump github.com/prometheus/prometheus from 0.307.2 to 0.307.3 by @dependabot[bot] in #1809
  • chore(deps): bump github.com/prometheus/common from 0.67.1 to 0.67.2 by @dependabot[bot] in #1807
  • logging cleanup of scheduler pkg by @nirrozenbaum in #1806
  • chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 by @dependabot[bot] in #1808
  • allow overriding the runner's containing executable name by @elevran in #1813
  • quickstart numbering by @nirrozenbaum in #1819
  • [SLO Routing] Add Latency Predictor sidecars and EPP tools by @BenjaminBraunDev in #1791
  • update inferencepool helm chart flags to be map instead of an array by @nirrozenbaum in #1818
  • feat: Configure LRUCacheSize using the numGPUBlocks for approximate prefix cache by @zetxqx in #1748
  • don't use cluster scope permissions when metrics auth is disabled by @nirrozenbaum in #1804
  • Add benchmarking folder by @rlakhtakia in #1689
  • Add prompt_cached_tokens metrics from each response. by @zetxqx in #1814
  • hotfix to helm chart. missing quotes by @nirrozenbaum in #1825
  • Correct the InferencePoolResolvedRefsCondition conformance tests. by @zetxqx in #1756
  • Adjust default scorer weights to favor more prefix cache affinity by @liu-cong in #1827
  • refactor: Flatten Flow Control queue plugin directory structure by @LukeAVanDrie in #1824
  • Update docs on prefix cache plugin related metrics by @liu-cong in #1828
  • Add prefix cache aware benchmarking config by @rlakhtakia in #1822
  • feat: add validation and fallback for prefix cache config fields by @googs1025 in #1846
  • chore(deps): bump github.com/envoyproxy/go-control-plane/envoy from 1.35.0 to 1.36.0 by @dependabot[bot] in #1844
  • chore(deps): bump golang.org/x/sync from 0.17.0 to 0.18.0 by @dependabot[bot] in #1845
  • Improvements to the E2E Test utilities by @shmuelk in #1853
  • Conformance: Adds Data Parallelism Test by @danehans in #1769
  • fix incorrect interface input parameter names by @googs1025 in #1865
  • docs: Adding the Gateway inference support documentation for Nginx Ga… by @sindhushiv in #1789
  • helm support for sidecar injection in EPP by @capri-xiyue in #1821
  • Helm: Adds istio as a provider-scoped value for the inferencepool Chart by @danehans in #1831
  • refactor: Improve Flow Control queue contracts for clarity and correctness by @LukeAVanDrie in #1836
  • fix training server indentation bug and test yaml to build script by @kaushikmitr in #1854
  • Validate datalayer with additional testing by @elevran in #1857
  • Add PrepareData and Admission control plugins by @rahulgurnani in #1796
  • feat(api): Introduce InferenceModelRewrite API by @zetxqx in #1816
  • Add owners files to subsections by @kfswain in #1874
  • Additional data layer tests by @irar2 in #1876
  • chore(deps): bump the kubernetes group with 6 updates by @dependabot[bot] in #1873
  • feat: Extend the text based configuration to include feature flags and the SaturationDetector's configuration by @shmuelk in #1492
  • refactor bbr main as a prep for pluggability by @nirrozenbaum in #1867
  • use a dispatch ticker to dispatch requests periodly in ShardProcessor… by @delavet in #1850
  • feat(conformance): add responseReceived plugin to support verifying destination endpoint. by @zetxqx in #1855
  • some cleanup in runner and config loading + deprecation notes by @nirrozenbaum in #1880
  • fix bbr dockerfile post build by @nirrozenbaum in #1881
  • add shmuelk as code reviewer by @nirrozenbaum in #1882
  • SLO Aware Routing Plugins Only by @BenjaminBraunDev in #1849
  • Upload prefill and decode heavy benchmarking configs by @rlakhtakia in #1848
  • Update outdated documentation for monitoring config of GKE by @JeffLuoo in #1837
  • Enable EPP to support endpoint discovery using pod selector by @c...
Read more

v1.1.0

27 Oct 21:03
v1.1.0

Choose a tag to compare

New and noteworthy

  • This release is primarily focused on sharing and enabling users to try our experimental features we are developing:

  • Flow Control is available as an experimental feature! To enable include ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER as an env var, set to true (this can be done from the helm chart). Docs are WIP and soon coming!

  • Multi-port support is available with GW implementations that also support this. This enables sophisticated features like Wide EP. GW providers support forthcoming.

  • Multi-Cluster support the API surface has been extended to experimentally support multi-cluster support. Docs are WIP and coming soon!

What's Changed

Read more

v1.1.0-rc.1

22 Oct 16:21
v1.1.0-rc.1

Choose a tag to compare

v1.1.0-rc.1 Pre-release
Pre-release

New and noteworthy

This release is primarily focused on sharing and enabling users to try our experimental features we are developing:

  • Flow Control is available as an experimental feature! To enable include ENABLE_EXPERIMENTAL_FLOW_CONTROL_LAYER as an env var, set to true (this can be done from the helm chart). Docs are WIP and soon coming!

  • Multi-port support is available with GW implementations that also support this. This enables sophisticated features like Wide EP. GW providers support forthcoming.

  • Multi-Cluster support the API surface has been extended to experimentally support multi-cluster support. Docs are WIP and coming soon!

What's Changed

Read more

v1.0.2

17 Oct 18:08
v1.0.2
cd03ff7

Choose a tag to compare

What's Changed

Full Changelog: v1.0.1...v1.0.2

v1.0.1

25 Sep 23:43
v1.0.1

Choose a tag to compare

What's Changed

Bug fixes to helm charts, no changes in EPP image or IGW APIs

Full Changelog: v1.0.0...v1.0.1

v1.0.1-rc.1

23 Sep 02:56
v1.0.1-rc.1

Choose a tag to compare

v1.0.1-rc.1 Pre-release
Pre-release

This is a small patch release to fix helm issues.

Context: #1616

v1.0.0

09 Sep 00:04
v1.0.0

Choose a tag to compare

Inference Gateway v1

This release marks the v1 of Inference Gateway, and with it the promotion of the InferencePool CRD to v1.

We're excited to announce our v1 release of Inference Gateway! A huge thank you to our contributors, gateway implementers, and downstream community for helping to shape IGW into something we are proud of.

If you're new: Please take a look at our guide to get started! Or learn more about IGW here: https://gateway-api-inference-extension.sigs.k8s.io/

There is still much to do and more enhancements to come. Namely:

  • SLO-based predictive scheduling
  • Flow Control for multi-tenancy support
  • An improved pluggable Data Layer system
  • Multi-modal support
  • APIs to support meeting multiple different SLOs in a single InferencePool

We look forward to what's next in the Inference space and looking forward to continuing to grow with it.

Onwards!

Cheers,
The IGW maintainer team

What's Changed

Read more

v1.0.0-rc.4

08 Sep 12:10
v1.0.0-rc.4
7ce1f47

Choose a tag to compare

v1.0.0-rc.4 Pre-release
Pre-release

a list of PRs that are cherry picked into RC4:

CRD updates:

#1521

performance issues fixed in pickers:

#1523
#1514
#1528

helm chart fix:

#1522
#1540
#1542

bug fix in prefix when no request id header is supplied by the gateway:

#1490 (was on the original list but somehow missed, without this prefix cache won't work in bursty workload)

test flake fix, required for llm-d to use formal image of IGW:

#1534

** all the items in this list have been cherry picked successfully into the release branch.

v1.0.0-rc.3

05 Sep 11:57
v1.0.0-rc.3
9c24d20

Choose a tag to compare

v1.0.0-rc.3 Pre-release
Pre-release

cherry picked PRs:
#1508 - critical bug fix to allow setting custom plugins config through helm chart
#1509 - prefix writing its state to CycleState.
#1412 - new weighted random picker

v1.0.0-rc.2

29 Aug 00:11
v1.0.0-rc.2

Choose a tag to compare

v1.0.0-rc.2 Pre-release
Pre-release

This release is primarily updating the InferencePool API and Conformance tests after the completion of the API review conducted in this PR: #1173

NOTE: Barring any breaking change after this RC the APIs are considered frozen for the remainder of the v1.0 release cycle