Skip to content

Conversation

@rajinator
Copy link

@rajinator rajinator commented Oct 9, 2025

This change integrates the readiness probe script from the llm-d repository PR #330 into llm-d-inference-sim, following the same Dockerfile-only approach used in llm-d PR #330 without requiring any Go code modifications.

The reason for this change is to address issue #300 on the llm-d/llm-d repo. In PR #330, at the moment, E2E testing with sim images breaks due to readiness-probe script not being available in the image

The readiness probe provides comprehensive 3-stage health validation:

  • Stage 1: Basic health endpoint (/health) responding
  • Stage 2: Models endpoint (/v1/models) available
  • Stage 3: Model metadata properly returned (non-empty data)

Changes included:

  • Add scripts/readiness_probe.sh: Comprehensive health check script that validates vLLM-compatible API readiness with configurable timeouts and flexible host/port parameters
  • Update Dockerfile: Install curl runtime dependency and copy the readiness probe script to /usr/local/bin/ with executable permissions

The script is available in the container for users who need advanced readiness validation but does not change existing behavior. Users can optionally configure exec-based probes using this script while the default HTTP-based probes continue to work as before.

This addresses the need for more robust model-loading verification in production deployments where the basic HTTP health check may return success before the model is fully loaded and ready to serve requests.

This change integrates the readiness probe script from the llm-d
repository pr#300 into llm-d-inference-sim, following the same Dockerfile-only
approach used in llm-d pr#300 without requiring any Go code modifications.

The readiness probe provides comprehensive 3-stage health validation:
- Stage 1: Basic health endpoint (/health) responding
- Stage 2: Models endpoint (/v1/models) available
- Stage 3: Model metadata properly returned (non-empty data)

Changes included:
- Add scripts/readiness_probe.sh: Comprehensive health check script
  that validates vLLM-compatible API readiness with configurable
  timeouts and flexible host/port parameters
- Update Dockerfile: Install curl runtime dependency and copy the
  readiness probe script to /usr/local/bin/ with executable permissions

The script is available in the container for users who need advanced
readiness validation but does not change existing behavior. Users can
optionally configure exec-based probes using this script while the
default HTTP-based probes continue to work as before.

This addresses the need for more robust model-loading verification in
production deployments where the basic HTTP health check may return
success before the model is fully loaded and ready to serve requests.

Signed-off-by: rajinator <[email protected]>
@rajinator rajinator changed the title WIP - Add vLLM readiness probe script to container image for supporting WIP - Add vLLM readiness probe script to sim container image Oct 9, 2025
@mayabar
Copy link
Collaborator

mayabar commented Oct 23, 2025

@rajinator is it still WIP or ready for review?

@rajinator
Copy link
Author

@mayabar it's still WIP, waiting on @Gregory-Pereira to test the updated alternate approach in llm-d/llm-d#330

The testing will let us know whether to close or move forward with this

@rajinator
Copy link
Author

Closing this because llm-d/llm-d#330 has been tested and merged with the alternate approach. Thank you @Gregory-Pereira !

@rajinator rajinator closed this Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants