-
Notifications
You must be signed in to change notification settings - Fork 195
feat: Implement Model Rewrite and Traffic Splitting Logic #1820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zetxqx The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5302959 to
2c27884
Compare
|
/retest |
2c27884 to
48f4afa
Compare
48f4afa to
8485ba6
Compare
ae26313 to
3e9939f
Compare
pkg/epp/datastore/datastore.go
Outdated
| // key: InferenceObjective name, value: *InferenceObjective | ||
| objectives map[string]*v1alpha2.InferenceObjective | ||
| // key: types.NamespacedName, value: *v1alpha2.InferenceModelRewrite | ||
| rewrites map[types.NamespacedName]*v1alpha2.InferenceModelRewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I don't want to fetch ALL Rewrite objects on every request (assuming there will be objects that are model specific).
I think this should be reflected here in the way we store the objects.
keeping NamespacedName as the key in the name doesn't give us any "smart" mapping.
I think we should map from model_name -> rewrite object, while keeping empty model name as match for all requests. so on every request we call datastore to get the relevant rewrite objects and get back the:
- specific rewrite objects for the requested model; and
- the general rewrite objects that match all requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, working on some patches to improve it will let you know when it's ready to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a new ModelRewriteStore for the modelRewrite, and making the rewrite request fetching efficiently. O(1). Adding ModelRewrite may not be very efficient but that operation should not happen very often.
Notably I changed the API rule conflict precedence rule , now:
- we'll always consider Exact match.
- if exact match not matching anything or multiple rules have exact match, we then compare the createTimestamp, and older rule wins.
29119d4 to
bf1ddce
Compare
| // Across all rules specified on applicable rewrites, precedence MUST be | ||
| // given to the match having an "Exact" model match over a generic match | ||
| // (a rule with an empty `matches` array). | ||
| // | ||
| // If ties still exist across multiple InferenceModelRewrite resources (e.g. | ||
| // two rewrites both have an exact match for the same model), matching | ||
| // precedence MUST be determined by the oldest resource based on | ||
| // creation timestamp. | ||
| // | ||
| // If ties still exist within a single InferenceModelRewrite resource, the | ||
| // FIRST matching rule (in list order) is used. | ||
| // +required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the precedence rules for conflicting matches to better align with the HTTPRoute specification in the Kubernetes Gateway API. https://github.com/kubernetes-sigs/gateway-api/blob/f24f3a61f398c65ab629da1843cb65fd5ec9419f/apis/v1/httproute_types.go#L148-L209
The new precedence order is:
- More specific wins: An Exact match always takes precedence over an All match (where the matches array is empty).
- Tie-Breaker (Oldest Rule): If the specificity of the rules is the same (a tie), the rule that was created or deployed first (the older rule) wins.
This approach is more intuitive and simplifies the implementation of efficient RewriteRule fetching per request. Specifically, when we find an exact match, we no longer need to compare it against less specific, generic rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SG, this matches what we had with InferenceModel also.
bf1ddce to
dc0cca4
Compare
|
This is ready for review now. I didn't split things up because I feel most of part is relevant but I'm open to split it up if this is too large for review. The main changes are:
|
| // ModelRewriteStore encapsulates the logic for storing and retrieving | ||
| // InferenceModelRewrite rules, handling precedence correctly. This struct is not | ||
| // thread-safe; concurrency must be managed by its consumer. | ||
| type ModelRewriteStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are listing the methods in the datastore, and having the datastore handle thread safety, should we merge this with the store, and if we intent for it to act as a substruct to help organize logic, can we instead unexport these functions?
| } | ||
| } | ||
|
|
||
| func (rr rewriteRuleWithMetadata) isGeneric() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whats the intended usecase here? a flat rewrite for any incoming req?
| func (ms *ModelRewriteStore) GetRule(modelName string) *v1alpha2.InferenceModelRewriteRule { | ||
| // Exact matches have the highest precedence. | ||
| if rulesWithMd, ok := ms.rulesByExactModelMatch[modelName]; ok && len(rulesWithMd) > 0 { | ||
| return &rulesWithMd[0].rule // The list is pre-sorted, so the first element is the oldest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we intent to flag this status in a controller in the future?
|
|
||
| for i := range infModelRewrite.Spec.Rules { | ||
| ruleWithMetadata := newRuleWithMetadata(infModelRewrite, i) | ||
| if ruleWithMetadata == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having read this function, im not sure how this would ever be possible to be nil, but also, this function is so small, (and the name isnt exactly clear of its intent) can we just inline it?
| // object into individual rules and stores them in the appropriate data structures, | ||
| // ensuring they remain sorted by precedence. | ||
| func (ms *ModelRewriteStore) Set(infModelRewrite *v1alpha2.InferenceModelRewrite) { | ||
| nn := getNN(infModelRewrite) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to get the namespace name if we have already associated the rewrite with the pool? (hence the same namespace)
| }) | ||
|
|
||
| for model := range ms.rulesByExactModelMatch { | ||
| sort.Slice(ms.rulesByExactModelMatch[model], func(i, j int) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to think through when you would want multiple alias models to map to a range of target models. By allowing an array:
| Matches []Match `json:"matches,omitempty"` |
We increase the chance of a name collision and make our reconciler code more complex, which is more opportunity for bugs
| // It always returns the requestContext even in the error case, as the request context is used in error handling. | ||
| func (d *Director) HandleRequest(ctx context.Context, reqCtx *handlers.RequestContext) (*handlers.RequestContext, error) { | ||
| logger := log.FromContext(ctx) | ||
| d.applyWeightedModelRewrite(reqCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we validated that this works? We are changing the content length of the body and i think we need to change the corresponding header, I don't remember if we removed that code of not.
0359789 to
3c2c178
Compare
|
@zetxqx: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This pull request introduces the core logic for model rewriting and weighted traffic
splitting within the request control director. It also includes the reconciler logic for
InferenceModelRewriteresources.Key changes:
pkg/epp/requestcontrol:applyWeightedModelRewritefunction to handle model rewriting based onInferenceModelRewriterules.InferenceModelRewriteresource is respected in case ofduplicate rules
pkg/epp/controller:InferenceModelRewriteresourcesWhich issue(s) this PR fixes:
Fixes partially #1811
Does this PR introduce a user-facing change?: