Skip to content

Conversation

@elieserr
Copy link

@elieserr elieserr commented Oct 15, 2025

This include various changes

  • use ghcr.io/llm-d/llm-d-cuda:v0.3.0 for prefill and decode deployments
  • use ghcr.io/llm-d/llm-d-inference-scheduler:v0.3.1
  • update the template for inference scheduler to match v0.3.1 args
  • use default-pd-config.yaml for EPP config
  • set VLLM_NIXL_SIDE_CHANNEL_PORT variable to default vLLM port
  • disable default deprecated inferencemodels
  • replaces rbac of inferencemodels with inferenceobjectives
  • reduce cpu requirements since the model of the example fits on smaller node

Fixes #130

@elieserr elieserr changed the title update pd disaggregation example update pd disaggregation templates and example Oct 15, 2025
@elieserr elieserr force-pushed the update-pd-disagregation-example branch from c6f5b5e to 1254d08 Compare October 20, 2025 18:48
Signed-off-by: Elieser Pereira <[email protected]>
@elieserr elieserr force-pushed the update-pd-disagregation-example branch from c7ecf31 to 7d78630 Compare October 20, 2025 23:13
@kalantar
Copy link
Collaborator

we are thinking that the endpoint picker/inferencepool pieces should all be removed from the modelservice chart. There is an upstream chart defined here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/config/charts/inferencepool (released versions at oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool) that has all these updates already. Can you try this and see where you are stuck?

See #135.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update PD disaggregation example

2 participants