Skip to content

Commit d3bbd4b

Browse files
authored
Merge branch 'release/2.0' into feature/ec2-ss2-install
2 parents ed6c7a4 + 57d03d8 commit d3bbd4b

File tree

11 files changed

+243
-132
lines changed

11 files changed

+243
-132
lines changed

configs/common/modules_lmod.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,12 @@ modules:
266266
set:
267267
'UPP_INC': '{prefix}/include'
268268
'UPP_LIB': '{prefix}/lib/libupp.a'
269+
# NOTE: This will be fixed upstream in udunits Spack package
270+
# Once this is done, remove this section
271+
udunits:
272+
environment:
273+
set:
274+
'UDUNITS2_XML_PATH': '{prefix}/share/udunits/udunits2.xml'
269275

270276
hierarchy:
271277
- mpi

configs/common/modules_tcl.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -285,3 +285,9 @@ modules:
285285
set:
286286
'UPP_INC': '{prefix}/include'
287287
'UPP_LIB': '{prefix}/lib/libupp.a'
288+
# NOTE: This will be fixed upstream in udunits Spack package
289+
# Once this is done, remove this section
290+
udunits:
291+
environment:
292+
set:
293+
'UDUNITS2_XML_PATH': '{prefix}/share/udunits/udunits2.xml'

configs/sites/tier1/nas/README.md

Lines changed: 184 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,180 +1,275 @@
1-
# How to build spack-stack at NAS
1+
# How to Build **spack-stack** at NAS
22

3-
In the commands below some will be run on login nodes (with internet access) and some
4-
on compute nodes as, at NAS, you aren't allowed more than 2 processes on a login node.
3+
This guide documents how to build **spack-stack** on NASA NAS systems, where login nodes have internet access but are CPU-restricted, while compute nodes allow parallel builds but have *no* internet access. Several packages (Rust/Cargo, ecFlow, CRTM) require special handling due to these constraints.
54

6-
## Machines
5+
---
76

8-
For the below you will need to login to both an `afe01` node for one step. You'll
9-
also want to get a Rome compute node for the rest of the steps.
7+
## Table of Contents
8+
9+
- [Overview](#overview)
10+
- [Machines Required](#machines-required)
11+
- [Clone spack-stack](#clone-spack-stack)
12+
- [Obtain an Interactive Compute Node](#obtain-an-interactive-compute-node)
13+
- [Setup spack-stack](#setup-spack-stack)
14+
- [Create Environments](#create-environments)
15+
- [oneAPI Environment](#oneapi-environment)
16+
- [GCC Environment](#gcc-environment)
17+
- [Activate the Environment](#activate-the-environment)
18+
- [Concretize the Environment](#concretize-the-environment)
19+
- [Create Source Cache (LOGIN NODE ONLY)](#create-source-cache-login-node-only)
20+
- [Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY)](#pre-fetch-cargo-dependencies-login-node-only)
21+
- [Install Packages](#install-packages)
22+
- [Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE)](#step-1--dependencies-of-rust-codes-and-ecflow-compute-node)
23+
- [Step 2 — Rust codes and ecFlow (AFE LOGIN NODE)](#step-2--rust-codes-and-ecflow-afe-login-node)
24+
- [Step 3 — Remaining Packages (COMPUTE NODE)](#step-3--remaining-packages-compute-node)
25+
- [Packages Requiring Internet](#packages-requiring-internet)
26+
- [Update Module Files](#update-module-files)
27+
- [Deactivate the Environment](#deactivate-the-environment)
28+
- [Debugging Package Builds](#debugging-package-builds)
29+
30+
---
31+
32+
## Overview
33+
34+
Due to NAS system architecture and network restrictions:
35+
36+
- **Login nodes**:
37+
- Have internet
38+
- Limited to **2 processes**
39+
- `pfe` nodes use **Sandy Bridge** (too old for x86_64_v3 builds)
40+
41+
- **Compute nodes** (Milan / Rome):
42+
- No internet
43+
- Allow parallel builds
44+
45+
Some packages (Cargo/Rust, ecFlow, CRTM) require internet or newer CPU features, so the install is broken into multiple steps across different node types.
46+
47+
---
48+
49+
## Machines Required
50+
51+
You will need:
52+
53+
- **An `afe01` login node**
54+
Supports x86_64_v3 binaries → required for building Rust packages and ecFlow.
55+
56+
- **A Rome or Milan compute node**
57+
Used for the main installation with multiple cores.
58+
59+
---
1060

1161
## Clone spack-stack
1262

13-
```
14-
git clone --recurse-submodules https://github.com/mathomp4/spack-stack.git -b feature/nas_install_spack_v1 spack-stack-2.0.0-test
63+
Use the appropriate branch or tag:
64+
65+
```bash
66+
git clone --recurse-submodules https://github.com/JCSDA/spack-stack.git \
67+
-b spack-stack-2.0.0 spack-stack-2.0.0
1568
```
1669

17-
## Grab interactive node
70+
---
1871

19-
Since NAS limits you to 2 processes on a login node, you'll need to grab an interactive node. For example:
20-
```
21-
qsub -I -V -X -l select=1:ncpus=128:mpiprocs=128:model=mil_ait -l walltime=12:00:00 -W group_list=s1873 -m b -N Interactive
72+
## Obtain an Interactive Compute Node
73+
74+
NAS login nodes allow only **2 processes**, so use:
75+
76+
```bash
77+
qsub -I -V -X \
78+
-l select=1:ncpus=128:mpiprocs=128:model=mil_ait \
79+
-l walltime=12:00:00 \
80+
-W group_list=s1873 \
81+
-m b \
82+
-N Interactive
2283
```
23-
will get you a Milan node for 12 hours
2484

25-
## Setup spack-stack on each node
85+
This gives a **Milan** compute node for up to 12 hours.
2686

27-
We will start on a login node with internet access. This is mainly needed for the
28-
`spack mirror create` command which downloads all the source code for the packages.
87+
---
2988

30-
```
31-
cd spack-stack-2.0.0-test
89+
## Setup spack-stack
90+
91+
Run on a **login node with internet**:
92+
93+
```bash
94+
cd spack-stack-2.0.0
3295
. setup.sh
3396
```
3497

35-
## Create environments
98+
---
3699

37-
We create two different environments, one for oneAPI and one for GCC. The commands below
38-
are used to create the environments. You only need to do this once.
100+
## Create Environments
39101

40-
### oneAPI
102+
You only need to create each environment once.
41103

42-
To create the oneAPI environment, do:
104+
### oneAPI Environment
43105

44-
```
45-
spack stack create env --name ue-oneapi-2024.2.0 --template unified-dev --site nas --compiler=oneapi-2024.2.0
106+
```bash
107+
spack stack create env --name ue-oneapi-2024.2.0 \
108+
--template unified-dev --site nas --compiler=oneapi-2024.2.0
46109
cd envs/ue-oneapi-2024.2.0
47110
```
48111

49-
### GCC
50-
51-
To create the GCC environment, do:
112+
### GCC Environment
52113

53-
```
54-
spack stack create env --name ue-gcc-13.2.0 --template unified-dev --site nas --compiler gcc-13.2.0
114+
```bash
115+
spack stack create env --name ue-gcc-13.2.0 \
116+
--template unified-dev --site nas --compiler=gcc-13.2.0
55117
cd envs/ue-gcc-13.2.0
56118
```
57119

58-
## Activate environment
120+
---
59121

60-
Now enter the spack environment you just created:
122+
## Activate the Environment
61123

62-
```
124+
```bash
63125
spack env activate .
64126
```
65127

66-
NOTE: You need to make sure you do this in *any* terminal where you want to do any commmand
67-
below with this environment.
128+
> **Important:** Run this in *every* terminal where you plan to run Spack commands.
129+
130+
---
131+
132+
## Concretize the Environment
68133

69-
## Concretize and create source cache
134+
Run on a **login node** (internet required for bootstrapping Clingo and other tools):
70135

136+
```bash
137+
spack concretize 2>&1 | tee log.concretize ; bell
71138
```
72-
spack concretize 2>&1 | tee log.concretize
139+
140+
### Optional `bell` helper
141+
142+
```bash
143+
bell() { tput bel ; printf "\nFinished at: " ; date; }
73144
```
74145

75-
NOTE: The first time you do this on a new build, you should do it on a *LOGIN* node. This is because
76-
it might need to bootstrap things and so it will reach out to the internet.
146+
---
77147

78-
## Create source cache (LOGIN NODE ONLY)
148+
## Create Source Cache (LOGIN NODE ONLY)
79149

80-
Because this step downloads all the source code for all packages and all versions, it
81-
should be done on a login node with internet access.
150+
This downloads all source tarballs for your environment:
82151

83-
```
84-
spack mirror create -a -d /swbuild/gmao_SIteam/spack-stack/source-cache
152+
```bash
153+
spack mirror create -a \
154+
-d /swbuild/gmao_SIteam/spack-stack/source-cache
85155
```
86156

87-
NOTE: Make sure you are in an environment when you run that `spack mirror create` command. Otherwise,
88-
you will download *EVERY* package and *EVERY* version in spack!
157+
> ⚠️ **Do not run this outside an activated environment.**
158+
> Otherwise Spack will attempt to mirror **every** known package/version.
89159
90-
## Pre-fetch cargo packages (LOGIN NODE ONLY)
160+
---
91161

92-
Some packages use Rust/Cargo for dependencies. These need internet access to build. So we pre-fetch them here.
162+
## Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY)
93163

94-
We need to set `CARGO_HOME` to a location where the Cargo deps have been downloaded
164+
Rust packages frequently require network access during build. Pre-fetch their dependencies:
95165

96-
```
166+
```bash
97167
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
98168
../../util/fetch_cargo_deps.py
99169
```
100170

101-
NOTE: `CARGO_HOME` should be set as well on the COMPUTE node!
171+
> ⚠️ **You must also set `CARGO_HOME` on compute nodes** before building.
102172
103-
## Install packages
173+
---
104174

105-
Our install process will actually have (at least) three steps. This is because of the `crtm` package
106-
which requires internet access at build time.
175+
## Install Packages
107176

108-
### Install Step 1: Dependencies of Rust codes and ecflow (COMPUTE NODE)
177+
Installation requires three stages:
109178

110-
We currently have some codes that use rust/cargo for dependencies. And, for some reason,
111-
even doing the "cargo dependencies" as above, they still need internet
112-
access to build/install.
179+
| Step | Node Type | Why |
180+
|------|-----------|-----|
181+
| Step 1 | Compute | Build dependencies in parallel, avoids CPU limits |
182+
| Step 2 | `afe` login | Needed for x86_64_v3 Python and internet access |
183+
| Step 3 | Compute | Finish main installation at high parallelism |
113184

114-
As for ecflow, we built QT on a login node (as it was the only complete node), so we
115-
then have to build ecflow on a login node as well.
185+
---
116186

117-
So we first install all the dependencies of then codes.
187+
### Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE)
118188

119-
```
189+
```bash
120190
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
121-
spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature --only dependencies py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.deps-for-rust-and-ecflow
191+
spack install -j 16 --verbose --fail-fast --show-log-on-error \
192+
--no-check-signature \
193+
--only dependencies py-cryptography py-maturin py-rpds-py ecflow \
194+
2>&1 | tee log.install.deps-for-rust-and-ecflow ; bell
122195
```
123196

124-
### Install Step 2: Rust Codes and ecflow (AFE LOGIN NODE)
197+
---
125198

126-
NOTE: You *MUST* run this on an afe login node. The reason is the pfe login nodes are Sandy
127-
Bridge but we are building Spack with `x86_64_v3` and these are too old (`_v2`). So
128-
you will get an illegal instruction error when the install below calls python3.
199+
### Step 2 — Rust codes and ecFlow (AFE LOGIN NODE)
129200

130-
So go back to an afe login node and run:
201+
`pfe` nodes use Sandy Bridge CPUs, which **cannot run** spack-stack’s x86_64_v3 Python interpreter → results in `Illegal instruction`.
131202

132-
```
203+
So this must be done on **afe**:
204+
205+
```bash
133206
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
134-
spack install -j 2 -p 1 --verbose --fail-fast --show-log-on-error --no-check-signature py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.rust-and-ecflow
207+
spack install -j 2 -p 1 --verbose --fail-fast --show-log-on-error \
208+
--no-check-signature \
209+
py-cryptography py-maturin py-rpds-py ecflow \
210+
2>&1 | tee log.install.rust-and-ecflow ; bell
135211
```
136212

137-
Note we are only using 2 processes here because NAS limits you to 2 processes on a login node.
213+
NAS limits login nodes to 2 processes, hence `-j 2`.
138214

139-
### Install Step 3: The rest (COMPUTE NODE)
215+
---
140216

141-
```
217+
### Step 3 — Remaining Packages (COMPUTE NODE)
218+
219+
```bash
142220
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
143-
spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature 2>&1 | tee log.install.after-cargo
221+
spack install -j 16 --verbose --fail-fast --show-log-on-error \
222+
--no-check-signature \
223+
2>&1 | tee log.install.after-cargo ; bell
144224
```
145225

146-
NOTE: You might need to run the `spack install` command multiple times because sometimes
147-
it just fails. But then you run it more and more and it will eventually succeed.
226+
> **Note:** You may need to re-run this command multiple times. Some builds fail intermittently but succeed on retry.
148227
149-
### Packages needing internet access to build
228+
---
150229

151-
If you encounter other packages that need internet access to build, you can install them with:
230+
### Packages Requiring Internet
152231

153-
```
154-
spack install -j 2 --verbose --fail-fast --show-log-on-error --no-check-signature <package> |& tee log.install.<package>
232+
If you encounter another package that insists on network access:
233+
234+
```bash
235+
spack install -j 2 --verbose --fail-fast --show-log-on-error \
236+
--no-check-signature <package> \
237+
|& tee log.install.<package> ; bell
155238
```
156239

157-
Then, once that package is built, you can go back to the compute node and run the `spack install` command again.
240+
Once built, return to the compute node and resume the full installation.
158241

159-
## Update module files and setup meta-modules
242+
---
160243

161-
```
162-
spack module tcl refresh -y --delete-tree
244+
## Update Module Files
245+
246+
After installation completes:
247+
248+
```bash
249+
spack module tcl refresh -y --delete-tree ; bell
163250
spack stack setup-meta-modules
164251
```
165252

166-
## Deactivate environment
253+
---
167254

168-
```
255+
## Deactivate the Environment
256+
257+
```bash
169258
spack env deactivate
170259
```
171260

172-
# Debugging a package
261+
---
173262

174-
When things go wrong, a good way to debug a failure is:
263+
## Debugging Package Builds
175264

176-
```
265+
```bash
177266
spack clean
178267
spack stage <package>
179268
spack build-env <package> -- bash --norc --noprofile
180269
```
270+
271+
This drops you into a clean build environment with the package’s full compiler/runtime environment loaded.
272+
273+
---
274+
275+

configs/sites/tier1/navy-aws/bootstrap.yaml

Lines changed: 0 additions & 9 deletions
This file was deleted.

0 commit comments

Comments
 (0)