Skip to content

Make lustre default in A series blueprints#5396

Draft
saara-tyagi27 wants to merge 4 commits intoGoogleCloudPlatform:developfrom
saara-tyagi27:filesystem-a-series
Draft

Make lustre default in A series blueprints#5396
saara-tyagi27 wants to merge 4 commits intoGoogleCloudPlatform:developfrom
saara-tyagi27:filesystem-a-series

Conversation

@saara-tyagi27
Copy link
Copy Markdown
Contributor

Submission Checklist

NOTE: Community submissions can take up to 2 weeks to be reviewed.

Please take the following actions before submitting this pull request.

  • Fork your PR branch from the Toolkit "develop" branch (not main)
  • Test all changes with pre-commit in a local branch #
  • Confirm that "make tests" passes all tests
  • Add or modify unit tests to cover code changes
  • Ensure that unit test coverage remains above 80%
  • Update all applicable documentation
  • Follow Cluster Toolkit Contribution guidelines #

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request transitions the default shared file system for the A3 UltraGPU and A4 HighGPU SLURM blueprints from Filestore to Managed Lustre. This change aims to leverage the performance benefits of Managed Lustre for high-performance computing workloads by updating the blueprint configurations to enable and utilize the new file system, ensuring a more optimized storage solution for these environments.

Highlights

  • Managed Lustre Integration: The A3 UltraGPU and A4 HighGPU SLURM blueprints have been updated to use Managed Lustre as the default shared file system for /home, replacing Filestore.
  • Configuration Updates: Variables related to Filestore IP ranges were removed, and Managed Lustre specific variables (instance ID, size, throughput) were enabled and configured in the A4 HighGPU blueprint.
  • Module Switching: The homefs module in both blueprints was switched from modules/file-system/filestore to modules/file-system/managed-lustre, and the private_service_access module was explicitly enabled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the a3ultra-slurm-blueprint.yaml and a4high-slurm-blueprint.yaml files to transition from using Filestore for /home to Managed Lustre. This involves removing the filestore_ip_range variable, setting install_managed_lustre to true, switching the homefs module source to managed-lustre, and configuring its settings. For a4high-slurm-blueprint.yaml, the Managed Lustre-related variables (lustre_instance_id, lustre_size_gib, per_unit_storage_throughput) are uncommented and initialized. Feedback from the review indicates that in a3ultra-slurm-blueprint.yaml, the Managed Lustre variables are still commented out, which will cause the blueprint to fail. Additionally, obsolete comments related to filestore_ip_range and Managed Lustre instructions in a4high-slurm-blueprint.yaml should be removed to improve clarity and maintainability.

@saara-tyagi27 saara-tyagi27 changed the title use lustre in a3u a4h Make lustre default in A series blueprints Mar 25, 2026
@saara-tyagi27
Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@saara-tyagi27
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates several machine learning blueprints (A3 HighGPU, A3 MegaGPU, A3 UltraGPU, A4 HighGPU, A4x HighGPU) to replace Filestore with Managed Lustre for shared home directories. The changes involve updating blueprint variables, switching the file system module, adding private service access, and enabling Managed Lustre installation. A consistent improvement opportunity was identified across all updated blueprints: the lustre_instance_id is currently a static string, which could cause conflicts upon multiple deployments. It is recommended to incorporate the deployment_name variable to ensure unique instance IDs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant