Skip to content

Conversation

kaushikmitr
Copy link
Contributor

@kaushikmitr kaushikmitr commented Oct 3, 2025

PR #1677 – Add batch prediction capability and lightGBM support to prediction sidecars

Overview

This PR enhances the latency predictor and scheduling pipeline in the Gateway API Inference Extension, introducing batch prediction support, consistent SLO header handling, improved test/deployment flows, and infrastructure updates. Batch predictions (prediction TTFT/TPOT for all pods in a single API call to the sidecars) makes things much more efficient.

Key Changes

Batch Prediction & SLO Headers

  • Added batch prediction support in the async latency predictor (latencypredictor_async.go) and updated tests.
  • Normalized all SLO-related HTTP headers to lowercase for consistent handling across clients and proxies.

Prediction Server & Model Support

  • Added LightGBM as a supported model, with proper runtime dependency installation (libgomp1) to prevent OpenMP errors.
  • Updated prediction_server.py logic to support multiple models and fallback handling.

Testing & CI/CD

  • Introduced a dedicated Dockerfile-test that builds a containerized test image running pytest by default.
  • Extended build-deploy.sh with new commands (test, test-deploy, all, images) to automate build → deploy → test workflows.
  • Added a Kubernetes batch job manifest (test-dual-server-deployment.yaml) for end-to-end CI-like test execution.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 3, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 3, 2025
@kaushikmitr
Copy link
Contributor Author

kaushikmitr commented Oct 3, 2025

@kfswain
Copy link
Collaborator

kfswain commented Oct 3, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 3, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaushikmitr, kfswain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 3, 2025
@k8s-ci-robot k8s-ci-robot merged commit 0901896 into kubernetes-sigs:slo-prediction-experimental Oct 3, 2025
8 checks passed
kaushikmitr added a commit to tomatillo-and-multiverse/gateway-api-inference-extension-slo that referenced this pull request Oct 7, 2025
* add latency predictor build readme

* update test dual server

* allow batch prediction

* allow batch prediction, update slo headers to all small
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants