Skip to content

Commit 76760fb

Browse files
committed
Merge remote-tracking branch 'origin/develop' into resource-policy
2 parents 9f213e6 + cc16e8e commit 76760fb

File tree

111 files changed

+1186
-1227
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

111 files changed

+1186
-1227
lines changed

community/examples/AMD/hpc-amd-slurm.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ deployment_groups:
168168
# these images must match the images used by Slurm modules below because
169169
# we are building OpenMPI with PMI support in libraries contained in
170170
# Slurm installation
171-
family: slurm-gcp-6-11-hpc-rocky-linux-8
171+
family: slurm-gcp-6-12-hpc-rocky-linux-8
172172
project: schedmd-slurm-public
173173

174174
- id: low_cost_nodeset

community/examples/hpc-build-slurm-image.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ vars:
2323
image_build_machine_type: n2d-standard-16
2424
build_from_image_family: hpc-rocky-linux-8
2525
build_from_image_project: cloud-hpc-image-public
26-
build_from_git_ref: 6.10.6
26+
build_from_git_ref: 6.12.1
2727
built_image_family: my-custom-slurm
2828
built_instance_image:
2929
family: $(vars.built_image_family)
Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# Copyright 2026 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
---
16+
17+
blueprint_name: hpc-slurm-kms
18+
19+
vars:
20+
project_id: ## Set GCP Project ID Here ##
21+
deployment_name: hpc-slurm-kms
22+
region: us-central1
23+
zone: us-central1-a
24+
kms_key: "projects/my-project/locations/us-central1/keyRings/my-keyring/cryptoKeys/my-key"
25+
kms_service_account: "my-sa@my-project.iam.gserviceaccount.com"
26+
27+
# Documentation for each of the modules used below can be found at
28+
# https://github.com/GoogleCloudPlatform/hpc-toolkit/blob/main/modules/README.md
29+
30+
deployment_groups:
31+
- group: primary
32+
modules:
33+
# Source is an embedded module, denoted by "modules/*" without ./, ../, /
34+
# as a prefix. To refer to a local module, prefix with ./, ../ or /
35+
- id: network
36+
source: modules/network/vpc
37+
38+
# Private Service Access (PSA) requires the compute.networkAdmin role which is
39+
# included in the Owner role, but not Editor.
40+
# PSA is a best practice for Filestore instances, but can be optionally
41+
# removed by deleting the private_service_access module and any references to
42+
# the module by Filestore modules.
43+
# https://cloud.google.com/vpc/docs/configure-private-services-access#permissions
44+
- id: private_service_access
45+
source: modules/network/private-service-access
46+
use: [network]
47+
48+
- id: homefs
49+
source: modules/file-system/filestore
50+
use: [network, private_service_access]
51+
settings:
52+
local_mount: /home
53+
54+
- id: debug_nodeset
55+
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
56+
use: [network]
57+
settings:
58+
node_count_dynamic_max: 4
59+
machine_type: n2-standard-2
60+
allow_automatic_updates: false
61+
disk_encryption_key_service_account: $(vars.kms_service_account)
62+
disk_encryption_key: $(vars.kms_key)
63+
64+
- id: debug_partition
65+
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
66+
use:
67+
- debug_nodeset
68+
settings:
69+
partition_name: debug
70+
exclusive: false # allows nodes to stay up after jobs are done
71+
is_default: true
72+
73+
- id: compute_nodeset
74+
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
75+
use: [network]
76+
settings:
77+
node_count_dynamic_max: 20
78+
bandwidth_tier: gvnic_enabled
79+
allow_automatic_updates: false
80+
disk_encryption_key: $(vars.kms_key)
81+
disk_encryption_key_service_account: $(vars.kms_service_account)
82+
83+
- id: compute_partition
84+
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
85+
use:
86+
- compute_nodeset
87+
settings:
88+
partition_name: compute
89+
90+
- id: h3_nodeset
91+
source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
92+
use: [network]
93+
settings:
94+
node_count_dynamic_max: 20
95+
# Note that H3 is available in only specific zones. https://cloud.google.com/compute/docs/regions-zones
96+
machine_type: h3-standard-88
97+
# H3 does not support pd-ssd and pd-standard
98+
# https://cloud.google.com/compute/docs/compute-optimized-machines#h3_disks
99+
disk_type: pd-balanced
100+
bandwidth_tier: gvnic_enabled
101+
allow_automatic_updates: false
102+
disk_encryption_key: $(vars.kms_key)
103+
disk_encryption_key_service_account: $(vars.kms_service_account)
104+
105+
- id: h3_partition
106+
source: community/modules/compute/schedmd-slurm-gcp-v6-partition
107+
use:
108+
- h3_nodeset
109+
settings:
110+
partition_name: h3
111+
112+
- id: slurm_login
113+
source: community/modules/scheduler/schedmd-slurm-gcp-v6-login
114+
use: [network]
115+
settings:
116+
machine_type: n2-standard-4
117+
enable_login_public_ips: true
118+
disk_encryption_key: $(vars.kms_key)
119+
disk_encryption_key_service_account: $(vars.kms_service_account)
120+
121+
- id: slurm_controller
122+
source: community/modules/scheduler/schedmd-slurm-gcp-v6-controller
123+
use:
124+
- network
125+
- debug_partition
126+
- compute_partition
127+
- h3_partition
128+
- homefs
129+
- slurm_login
130+
settings:
131+
enable_controller_public_ips: true
132+
disk_encryption_key: $(vars.kms_key)
133+
disk_encryption_key_service_account: $(vars.kms_service_account)
134+
slurm_bucket_kms_key: $(vars.kms_key)

community/examples/hpc-slurm-ubuntu2204.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ vars:
2424
slurm_image:
2525
# Please refer to the following link for the latest images:
2626
# https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/images.md#supported-operating-systems
27-
family: slurm-gcp-6-11-ubuntu-2204-lts-nvidia-570
27+
family: slurm-gcp-6-12-ubuntu-2204-lts-nvidia-570
2828
project: schedmd-slurm-public
2929

3030
deployment_groups:

community/examples/hpc-slurm6-apptainer.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ deployment_groups:
6060
settings:
6161
source_image_project_id: [schedmd-slurm-public]
6262
# see latest in https://github.com/GoogleCloudPlatform/slurm-gcp/blob/master/docs/images.md#published-image-family
63-
source_image_family: slurm-gcp-6-11-hpc-rocky-linux-8
63+
source_image_family: slurm-gcp-6-12-hpc-rocky-linux-8
6464
# You can find size of source image by using following command
6565
# gcloud compute images describe-from-family <source_image_family> --project schedmd-slurm-public
6666
disk_size: $(vars.disk_size)

community/examples/xpk-n2-filestore/xpk-n2-filestore.yaml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -101,15 +101,10 @@ deployment_groups:
101101
- source: $(vars.storage_crd_path)
102102
# Server-side applies avoid last-applied-configuration and associated annotation length issues
103103
- source: https://raw.githubusercontent.com/kubernetes-sigs/kjob/$(vars.kjob_version)/config/crd/bases/kjobctl.x-k8s.io_applicationprofiles.yaml
104-
server_side_apply: true
105104
- source: https://raw.githubusercontent.com/kubernetes-sigs/kjob/$(vars.kjob_version)/config/crd/bases/kjobctl.x-k8s.io_jobtemplates.yaml
106-
server_side_apply: true
107105
- source: https://raw.githubusercontent.com/kubernetes-sigs/kjob/$(vars.kjob_version)/config/crd/bases/kjobctl.x-k8s.io_rayclustertemplates.yaml
108-
server_side_apply: true
109106
- source: https://raw.githubusercontent.com/kubernetes-sigs/kjob/$(vars.kjob_version)/config/crd/bases/kjobctl.x-k8s.io_rayjobtemplates.yaml
110-
server_side_apply: true
111107
- source: https://raw.githubusercontent.com/kubernetes-sigs/kjob/$(vars.kjob_version)/config/crd/bases/kjobctl.x-k8s.io_volumebundles.yaml
112-
server_side_apply: true
113108

114109
- id: homefs
115110
source: modules/file-system/filestore

community/front-end/ofe/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ cfgv==3.5.0
1313
charset-normalizer==3.4.4
1414
click==8.3.1
1515
crispy-bootstrap5==2026.3
16-
cryptography==46.0.5
16+
cryptography==46.0.6
1717
decorator==5.2.1
1818
defusedxml==0.7.1
1919
dill==0.4.1
@@ -81,7 +81,7 @@ python3-openid==3.2.0
8181
pytz==2023.3
8282
PyYAML==6.0.3
8383
referencing==0.37.0
84-
requests==2.32.5
84+
requests==2.33.0
8585
requests-oauthlib==2.0.0
8686
retry==0.9.2
8787
rpds-py==0.30.0

community/front-end/ofe/website/ghpcfe/templates/blueprint/cluster_config.yaml.j2

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ deployment_groups:
8080
id: slurm_controller
8181
settings:
8282
instance_image:
83-
family: slurm-gcp-6-11-hpc-rocky-linux-8
83+
family: slurm-gcp-6-12-hpc-rocky-linux-8
8484
project: schedmd-slurm-public
8585
cloud_parameters:
8686
resume_rate: 0
@@ -118,7 +118,7 @@ deployment_groups:
118118
id: slurm_login
119119
settings:
120120
instance_image:
121-
family: slurm-gcp-6-11-hpc-rocky-linux-8
121+
family: slurm-gcp-6-12-hpc-rocky-linux-8
122122
project: schedmd-slurm-public
123123
num_instances: {{ cluster.num_login_nodes }}
124124
subnetwork_self_link: "projects/{{ cluster.project_id }}/regions/{{ cluster.cloud_region }}/subnetworks/{{ cluster.subnet.cloud_id }}"

community/front-end/ofe/website/ghpcfe/templates/blueprint/partition_config.yaml.j2

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ limitations under the License.
4949
family: {% if part.image.source_image_family == "Imported" %}{{ part.image.family }}{% else %}image-{{ part.image.family }}{% endif %}
5050
project: {{ cluster.project_id }}
5151
{% else %}
52-
family: slurm-gcp-6-11-hpc-rocky-linux-8
52+
family: slurm-gcp-6-12-hpc-rocky-linux-8
5353
project: schedmd-slurm-public
5454
{% endif %}
5555
{% if part.additional_disk_count > 0 %}

community/modules/compute/schedmd-slurm-gcp-v6-nodeset-dynamic/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ No resources.
102102
| <a name="input_enable_spot_vm"></a> [enable\_spot\_vm](#input\_enable\_spot\_vm) | Enable the partition to use spot VMs (https://cloud.google.com/spot-vms). | `bool` | `false` | no |
103103
| <a name="input_feature"></a> [feature](#input\_feature) | The node feature, used to bind nodes to the nodeset. If not set, the nodeset name will be used. | `string` | `null` | no |
104104
| <a name="input_guest_accelerator"></a> [guest\_accelerator](#input\_guest\_accelerator) | List of the type and count of accelerator cards attached to the instance. | <pre>list(object({<br/> type = string,<br/> count = number<br/> }))</pre> | `[]` | no |
105-
| <a name="input_instance_image"></a> [instance\_image](#input\_instance\_image) | Defines the image that will be used in the Slurm node group VM instances.<br/><br/>Expected Fields:<br/>name: The name of the image. Mutually exclusive with family.<br/>family: The image family to use. Mutually exclusive with name.<br/>project: The project where the image is hosted.<br/><br/>For more information on creating custom images that comply with Slurm on GCP<br/>see the "Slurm on GCP Custom Images" section in docs/vm-images.md. | `map(string)` | <pre>{<br/> "family": "slurm-gcp-6-11-hpc-rocky-linux-8",<br/> "project": "schedmd-slurm-public"<br/>}</pre> | no |
105+
| <a name="input_instance_image"></a> [instance\_image](#input\_instance\_image) | Defines the image that will be used in the Slurm node group VM instances.<br/><br/>Expected Fields:<br/>name: The name of the image. Mutually exclusive with family.<br/>family: The image family to use. Mutually exclusive with name.<br/>project: The project where the image is hosted.<br/><br/>For more information on creating custom images that comply with Slurm on GCP<br/>see the "Slurm on GCP Custom Images" section in docs/vm-images.md. | `map(string)` | <pre>{<br/> "family": "slurm-gcp-6-12-hpc-rocky-linux-8",<br/> "project": "schedmd-slurm-public"<br/>}</pre> | no |
106106
| <a name="input_instance_image_custom"></a> [instance\_image\_custom](#input\_instance\_image\_custom) | A flag that designates that the user is aware that they are requesting<br/>to use a custom and potentially incompatible image for this Slurm on<br/>GCP module.<br/><br/>If the field is set to false, only the compatible families and project<br/>names will be accepted. The deployment will fail with any other image<br/>family or name. If set to true, no checks will be done.<br/><br/>See: https://goo.gle/hpc-slurm-images | `bool` | `false` | no |
107107
| <a name="input_labels"></a> [labels](#input\_labels) | Labels to add to partition compute instances. Key-value pairs. | `map(string)` | `{}` | no |
108108
| <a name="input_machine_type"></a> [machine\_type](#input\_machine\_type) | Compute Platform machine type to use for this partition compute nodes. | `string` | `"c2-standard-60"` | no |

0 commit comments

Comments
 (0)