Make lustre default in A series blueprints#5396
Make lustre default in A series blueprints#5396saara-tyagi27 wants to merge 4 commits intoGoogleCloudPlatform:developfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request transitions the default shared file system for the A3 UltraGPU and A4 HighGPU SLURM blueprints from Filestore to Managed Lustre. This change aims to leverage the performance benefits of Managed Lustre for high-performance computing workloads by updating the blueprint configurations to enable and utilize the new file system, ensuring a more optimized storage solution for these environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
958b61f to
dff3234
Compare
There was a problem hiding this comment.
Code Review
This pull request updates the a3ultra-slurm-blueprint.yaml and a4high-slurm-blueprint.yaml files to transition from using Filestore for /home to Managed Lustre. This involves removing the filestore_ip_range variable, setting install_managed_lustre to true, switching the homefs module source to managed-lustre, and configuring its settings. For a4high-slurm-blueprint.yaml, the Managed Lustre-related variables (lustre_instance_id, lustre_size_gib, per_unit_storage_throughput) are uncommented and initialized. Feedback from the review indicates that in a3ultra-slurm-blueprint.yaml, the Managed Lustre variables are still commented out, which will cause the blueprint to fail. Additionally, obsolete comments related to filestore_ip_range and Managed Lustre instructions in a4high-slurm-blueprint.yaml should be removed to improve clarity and maintainability.
examples/machine-learning/a4-highgpu-8g/a4high-slurm-blueprint.yaml
Outdated
Show resolved
Hide resolved
examples/machine-learning/a4-highgpu-8g/a4high-slurm-blueprint.yaml
Outdated
Show resolved
Hide resolved
dc224fa to
f46cbc0
Compare
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request updates several machine learning blueprints (A3 HighGPU, A3 MegaGPU, A3 UltraGPU, A4 HighGPU, A4x HighGPU) to replace Filestore with Managed Lustre for shared home directories. The changes involve updating blueprint variables, switching the file system module, adding private service access, and enabling Managed Lustre installation. A consistent improvement opportunity was identified across all updated blueprints: the lustre_instance_id is currently a static string, which could cause conflicts upon multiple deployments. It is recommended to incorporate the deployment_name variable to ensure unique instance IDs.
Submission Checklist
NOTE: Community submissions can take up to 2 weeks to be reviewed.
Please take the following actions before submitting this pull request.