-
Notifications
You must be signed in to change notification settings - Fork 37
feat(storage): add model storage support for multi-node LLM workloads (LWS) #813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): add model storage support for multi-node LLM workloads (LWS) #813
Conversation
Skipping CI for Draft Pull Request. |
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
… (LWS) Port the existing storage-related functionality from single-node workloads to support PVC, OCI, Hugging Face, S3 in multi-node setups. Both leader and worker pods in the LeaderWorkerSet receive model artifact configuration, and also P/D configurations is covered. The mutation of the `main` container command to specify the model location had to be removed from the code and hard-coded in the templates. The motivation is because multi-node (LWS based) deployments are using a smarter start-up script which turns the command more complex. This effectively aligns LLMIsvc implementation to how InferenceServices work: the templates consistently use /mnt/models as the location for loading the model. Signed-off-by: Edgar Hernández <[email protected]>
Add test coverage for PVC storage configuration in multi-node LLM workloads using LeaderWorkerSet, ensuring reconcileMultiNodeWorkload correctly attaches model artifacts from PVC storage. Signed-off-by: Edgar Hernández <[email protected]>
Add test coverage for OCI storage configuration in multi-node LLM workloads using LeaderWorkerSet, ensuring reconcileMultiNodeWorkload correctly attaches model artifacts from OCI storage Signed-off-by: Edgar Hernández <[email protected]>
Add test coverage for HF storage configuration in multi-node LLM workloads using LeaderWorkerSet, ensuring reconcileMultiNodeWorkload correctly attaches model artifacts from HF storage Signed-off-by: Edgar Hernández <[email protected]>
Add test coverage for S3 storage configuration in multi-node LLM workloads using LeaderWorkerSet, ensuring reconcileMultiNodeWorkload correctly attaches model artifacts from S3 storage Signed-off-by: Edgar Hernández <[email protected]>
- For multi-node, the prefill spec should also include workers and parallelism configurations - Removing the assertions on the command to specify /mnt/models. This is not compatible with multi-workers case. Since the templates have been changed to hard-coded to use /mnt/models this assertion would no longer be applicable (i.e. assertion on a hard-coded value may not make sense). - The config_presets_test.go is updated to use /mnt/models. Signed-off-by: Edgar Hernández <[email protected]>
219b860
to
b24e105
Compare
Signed-off-by: Edgar Hernández <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: israel-hdez, pierDipi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
New changes are detected. LGTM label has been removed. |
Signed-off-by: Edgar Hernández <[email protected]>
eb0ac8c
to
e7fdaa5
Compare
7142a8b
into
opendatahub-io:release-v0.15
What this PR does / why we need it:
Port the existing storage-related functionality from single-node workloads to support PVC, OCI, Hugging Face, S3 in multi-node setups.
Both leader and worker pods in the LeaderWorkerSet receive model artifact configuration, and also P/D configurations is covered.
The mutation of the
main
container command to specify the model location had to be removed from the code and hard-coded in the templates. The motivation is because multi-node (LWS based) deployments are using a smarter start-up script which turns the command more complex. This effectively aligns LLMIsvc implementation to how InferenceServices work: the templates consistently use /mnt/models as the location for loading the model.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Related to https://issues.redhat.com/browse/RHOAIENG-27817
Type of changes
Please delete options that are not relevant.
Feature/Issue validation/testing:
Checklist: