This repository was archived by the owner on Oct 15, 2025. It is now read-only.
Implement upstream inference gateway integration with separated vLLM components (fixes #312)#321
Open
jeremyeder wants to merge 1 commit intollm-d:mainfrom
Conversation
…components Addresses issue llm-d#312 by creating a modular architecture that leverages upstream inference gateway charts while maintaining existing llm-d patterns. ## New Charts: - **llm-d-vllm**: Dedicated vLLM model serving components - **llm-d-umbrella**: Orchestration chart using upstream inferencepool ## Key Benefits: - True upstream integration with kubernetes-sigs/gateway-api-inference-extension - Modular design with clean separation of concerns - Intelligent load balancing and endpoint selection via InferencePool - Maintains backward compatibility with existing deployments ## Validation: - Comprehensive test suite with 4 test templates - Helm dependency build and lint pass successfully - Deployment-ready charts following existing patterns Uses correct OCI registry: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts Fixes vLLM capitalization throughout codebase
39b6b4e to
963d9fb
Compare
ahg-g
reviewed
Jun 17, 2025
ahg-g
reviewed
Jul 1, 2025
|
|
||
| ### 2. `llm-d-umbrella` Chart | ||
|
|
||
| **Purpose**: Combines upstream InferencePool with vLLM chart |
There was a problem hiding this comment.
I am not totally against an llm-d umbrella chart, we could have that; but I believe it is key to have instructions to deploy the two core components of vllm-d independently:
- A helm chart to deploy the vllm server (with the side car and set up with the right flags)
- Instructions to deploy an inference gateway (InferencePool resource+vllm-d EPP image) via the upstream chart [1] that points to the vllm deployment above.
This allows composing with customers existing infra (most already have a gateway deployed for example) and composes with the IGW much better.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR implements a modular architecture that leverages upstream inference gateway charts while maintaining existing llm-d patterns, fully addressing issue #312.
Key Changes
🆕 New Charts:
🏗️ Architecture Benefits:
🧪 Testing & Validation:
✅ Comprehensive Test Suite:
✅ Test Results:
Files Added
charts/llm-d-vllm/- Complete vLLM model serving chart (9 files)charts/llm-d-umbrella/- Umbrella orchestration chart (10 files)charts/IMPLEMENTATION_SUMMARY.md- Complete architecture documentationTest Plan
Migration Path
The implementation provides a clear migration path from the monolithic
llm-dchart to the new modular architecture while maintaining full backward compatibility.Implementation Details
See
charts/IMPLEMENTATION_SUMMARY.mdfor complete architectural overview, benefits achieved, and future enhancement opportunities.Closes #312