Skip to content

Conversation

@irar2
Copy link
Contributor

@irar2 irar2 commented Jan 12, 2026

This PR adds an ability to collect information from /v1/models and store it in endpoint's attributes.

Closes #466

Signed-off-by: irar2 <[email protected]>
// ModelInfo defines model's data returned from /v1/models API
type ModelInfo struct {
ID string `json:"id"`
Parent string `json:"parent,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parent field is not part of OpenAI standardization.
it's specific to vllm and might not work with other model servers.
I also don't think it's used (or should be used) anywhere.
I recommend removing this field.

OpenAI standard here:
https://platform.openai.com/docs/api-reference/models/list

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments

  • If not present, the omitempty kicks in so I don't see the downside of having it.
  • For use cases that need the parent information for Base/LoRA relations, if it is not provided by model extraction then one must assume the base model name is provided elsewhere. There is currently no other source of truth...

I think it is fine to rely on vLLM specific for that.

  1. It can be treated as part of the "contract" (same as the case when other model servers are expected to provide the MSP metrics even if by a different name).
  2. configuration of data sources is per EPP so you can always not enable this for other model servers . This is valid usage as long as we use homogeneous model server in a pool (other code breaks as well when this is not the case...)

@vMaroon vMaroon requested a review from nirrozenbaum January 13, 2026 10:19
@elevran
Copy link
Collaborator

elevran commented Jan 14, 2026

/hold
this should go in post v0.5

@github-actions github-actions bot added the hold label Jan 14, 2026
@elevran elevran added this to the v0.6 milestone Jan 22, 2026
@elevran elevran moved this to In review in llm-d-inference-scheduler Jan 22, 2026
@elevran elevran removed the hold label Jan 26, 2026
}

// NewModelExtractor returns a new model extractor.
func NewModelExtractor() (*ModelExtractor, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: at least in theory, the plugin could have a name...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelExtractor is a plugin. A plugin has a type and an optional name.
The code does not support setting a plugin name and it should.

}
}

ds := http.NewHTTPDataSource(cfg.Scheme, cfg.Path, cfg.InsecureSkipVerify, ModelsDataSourceType,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q; does NewHTTPDataSource validate the scheme?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is only a check if it's https

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we use the scheme passed in by the user it should at least sanitize it to ensure it's one one of a known set of acceptable values (e.g., "http" and "https").
Can be in this PR or separate adding scheme validation to the HTTPDataSource

@elevran
Copy link
Collaborator

elevran commented Feb 3, 2026

/lgtm
/approve
/hold

overall looks good. minor comments left so placing a hold. Leaving to your discretion if you want to amend or cancel the hold to allow merging as-is

@github-actions github-actions bot added hold lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 3, 2026
github-actions[bot]
github-actions bot previously approved these changes Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold lgtm "Looks good to me", indicates that a PR is ready to be merged.

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

Enable collection of configured / loaded models in each inference serving endpoint

3 participants