|
1 | | -# How to build spack-stack at NAS |
| 1 | +# How to Build **spack-stack** at NAS |
2 | 2 |
|
3 | | -In the commands below some will be run on login nodes (with internet access) and some |
4 | | -on compute nodes as, at NAS, you aren't allowed more than 2 processes on a login node. |
| 3 | +This guide documents how to build **spack-stack** on NASA NAS systems, where login nodes have internet access but are CPU-restricted, while compute nodes allow parallel builds but have *no* internet access. Several packages (Rust/Cargo, ecFlow, CRTM) require special handling due to these constraints. |
5 | 4 |
|
6 | | -## Machines |
| 5 | +--- |
7 | 6 |
|
8 | | -For the below you will need to login to both an `afe01` node for one step. You'll |
9 | | -also want to get a Rome compute node for the rest of the steps. |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +- [Overview](#overview) |
| 10 | +- [Machines Required](#machines-required) |
| 11 | +- [Clone spack-stack](#clone-spack-stack) |
| 12 | +- [Obtain an Interactive Compute Node](#obtain-an-interactive-compute-node) |
| 13 | +- [Setup spack-stack](#setup-spack-stack) |
| 14 | +- [Create Environments](#create-environments) |
| 15 | + - [oneAPI Environment](#oneapi-environment) |
| 16 | + - [GCC Environment](#gcc-environment) |
| 17 | +- [Activate the Environment](#activate-the-environment) |
| 18 | +- [Concretize the Environment](#concretize-the-environment) |
| 19 | +- [Create Source Cache (LOGIN NODE ONLY)](#create-source-cache-login-node-only) |
| 20 | +- [Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY)](#pre-fetch-cargo-dependencies-login-node-only) |
| 21 | +- [Install Packages](#install-packages) |
| 22 | + - [Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE)](#step-1--dependencies-of-rust-codes-and-ecflow-compute-node) |
| 23 | + - [Step 2 — Rust codes and ecFlow (AFE LOGIN NODE)](#step-2--rust-codes-and-ecflow-afe-login-node) |
| 24 | + - [Step 3 — Remaining Packages (COMPUTE NODE)](#step-3--remaining-packages-compute-node) |
| 25 | + - [Packages Requiring Internet](#packages-requiring-internet) |
| 26 | +- [Update Module Files](#update-module-files) |
| 27 | +- [Deactivate the Environment](#deactivate-the-environment) |
| 28 | +- [Debugging Package Builds](#debugging-package-builds) |
| 29 | + |
| 30 | +--- |
| 31 | + |
| 32 | +## Overview |
| 33 | + |
| 34 | +Due to NAS system architecture and network restrictions: |
| 35 | + |
| 36 | +- **Login nodes**: |
| 37 | + - Have internet |
| 38 | + - Limited to **2 processes** |
| 39 | + - `pfe` nodes use **Sandy Bridge** (too old for x86_64_v3 builds) |
| 40 | + |
| 41 | +- **Compute nodes** (Milan / Rome): |
| 42 | + - No internet |
| 43 | + - Allow parallel builds |
| 44 | + |
| 45 | +Some packages (Cargo/Rust, ecFlow, CRTM) require internet or newer CPU features, so the install is broken into multiple steps across different node types. |
| 46 | + |
| 47 | +--- |
| 48 | + |
| 49 | +## Machines Required |
| 50 | + |
| 51 | +You will need: |
| 52 | + |
| 53 | +- **An `afe01` login node** |
| 54 | + Supports x86_64_v3 binaries → required for building Rust packages and ecFlow. |
| 55 | + |
| 56 | +- **A Rome or Milan compute node** |
| 57 | + Used for the main installation with multiple cores. |
| 58 | + |
| 59 | +--- |
10 | 60 |
|
11 | 61 | ## Clone spack-stack |
12 | 62 |
|
13 | | -``` |
14 | | -git clone --recurse-submodules https://github.com/mathomp4/spack-stack.git -b feature/nas_install_spack_v1 spack-stack-2.0.0-test |
| 63 | +Use the appropriate branch or tag: |
| 64 | + |
| 65 | +```bash |
| 66 | +git clone --recurse-submodules https://github.com/JCSDA/spack-stack.git \ |
| 67 | + -b spack-stack-2.0.0 spack-stack-2.0.0 |
15 | 68 | ``` |
16 | 69 |
|
17 | | -## Grab interactive node |
| 70 | +--- |
18 | 71 |
|
19 | | -Since NAS limits you to 2 processes on a login node, you'll need to grab an interactive node. For example: |
20 | | -``` |
21 | | -qsub -I -V -X -l select=1:ncpus=128:mpiprocs=128:model=mil_ait -l walltime=12:00:00 -W group_list=s1873 -m b -N Interactive |
| 72 | +## Obtain an Interactive Compute Node |
| 73 | + |
| 74 | +NAS login nodes allow only **2 processes**, so use: |
| 75 | + |
| 76 | +```bash |
| 77 | +qsub -I -V -X \ |
| 78 | + -l select=1:ncpus=128:mpiprocs=128:model=mil_ait \ |
| 79 | + -l walltime=12:00:00 \ |
| 80 | + -W group_list=s1873 \ |
| 81 | + -m b \ |
| 82 | + -N Interactive |
22 | 83 | ``` |
23 | | -will get you a Milan node for 12 hours |
24 | 84 |
|
25 | | -## Setup spack-stack on each node |
| 85 | +This gives a **Milan** compute node for up to 12 hours. |
26 | 86 |
|
27 | | -We will start on a login node with internet access. This is mainly needed for the |
28 | | -`spack mirror create` command which downloads all the source code for the packages. |
| 87 | +--- |
29 | 88 |
|
30 | | -``` |
31 | | -cd spack-stack-2.0.0-test |
| 89 | +## Setup spack-stack |
| 90 | + |
| 91 | +Run on a **login node with internet**: |
| 92 | + |
| 93 | +```bash |
| 94 | +cd spack-stack-2.0.0 |
32 | 95 | . setup.sh |
33 | 96 | ``` |
34 | 97 |
|
35 | | -## Create environments |
| 98 | +--- |
36 | 99 |
|
37 | | -We create two different environments, one for oneAPI and one for GCC. The commands below |
38 | | -are used to create the environments. You only need to do this once. |
| 100 | +## Create Environments |
39 | 101 |
|
40 | | -### oneAPI |
| 102 | +You only need to create each environment once. |
41 | 103 |
|
42 | | -To create the oneAPI environment, do: |
| 104 | +### oneAPI Environment |
43 | 105 |
|
44 | | -``` |
45 | | -spack stack create env --name ue-oneapi-2024.2.0 --template unified-dev --site nas --compiler=oneapi-2024.2.0 |
| 106 | +```bash |
| 107 | +spack stack create env --name ue-oneapi-2024.2.0 \ |
| 108 | + --template unified-dev --site nas --compiler=oneapi-2024.2.0 |
46 | 109 | cd envs/ue-oneapi-2024.2.0 |
47 | 110 | ``` |
48 | 111 |
|
49 | | -### GCC |
50 | | - |
51 | | -To create the GCC environment, do: |
| 112 | +### GCC Environment |
52 | 113 |
|
53 | | -``` |
54 | | -spack stack create env --name ue-gcc-13.2.0 --template unified-dev --site nas --compiler gcc-13.2.0 |
| 114 | +```bash |
| 115 | +spack stack create env --name ue-gcc-13.2.0 \ |
| 116 | + --template unified-dev --site nas --compiler=gcc-13.2.0 |
55 | 117 | cd envs/ue-gcc-13.2.0 |
56 | 118 | ``` |
57 | 119 |
|
58 | | -## Activate environment |
| 120 | +--- |
59 | 121 |
|
60 | | -Now enter the spack environment you just created: |
| 122 | +## Activate the Environment |
61 | 123 |
|
62 | | -``` |
| 124 | +```bash |
63 | 125 | spack env activate . |
64 | 126 | ``` |
65 | 127 |
|
66 | | -NOTE: You need to make sure you do this in *any* terminal where you want to do any commmand |
67 | | -below with this environment. |
| 128 | +> **Important:** Run this in *every* terminal where you plan to run Spack commands. |
| 129 | +
|
| 130 | +--- |
| 131 | + |
| 132 | +## Concretize the Environment |
68 | 133 |
|
69 | | -## Concretize and create source cache |
| 134 | +Run on a **login node** (internet required for bootstrapping Clingo and other tools): |
70 | 135 |
|
| 136 | +```bash |
| 137 | +spack concretize 2>&1 | tee log.concretize ; bell |
71 | 138 | ``` |
72 | | -spack concretize 2>&1 | tee log.concretize |
| 139 | + |
| 140 | +### Optional `bell` helper |
| 141 | + |
| 142 | +```bash |
| 143 | +bell() { tput bel ; printf "\nFinished at: " ; date; } |
73 | 144 | ``` |
74 | 145 |
|
75 | | -NOTE: The first time you do this on a new build, you should do it on a *LOGIN* node. This is because |
76 | | -it might need to bootstrap things and so it will reach out to the internet. |
| 146 | +--- |
77 | 147 |
|
78 | | -## Create source cache (LOGIN NODE ONLY) |
| 148 | +## Create Source Cache (LOGIN NODE ONLY) |
79 | 149 |
|
80 | | -Because this step downloads all the source code for all packages and all versions, it |
81 | | -should be done on a login node with internet access. |
| 150 | +This downloads all source tarballs for your environment: |
82 | 151 |
|
83 | | -``` |
84 | | -spack mirror create -a -d /swbuild/gmao_SIteam/spack-stack/source-cache |
| 152 | +```bash |
| 153 | +spack mirror create -a \ |
| 154 | + -d /swbuild/gmao_SIteam/spack-stack/source-cache |
85 | 155 | ``` |
86 | 156 |
|
87 | | -NOTE: Make sure you are in an environment when you run that `spack mirror create` command. Otherwise, |
88 | | -you will download *EVERY* package and *EVERY* version in spack! |
| 157 | +> ⚠️ **Do not run this outside an activated environment.** |
| 158 | +> Otherwise Spack will attempt to mirror **every** known package/version. |
89 | 159 |
|
90 | | -## Pre-fetch cargo packages (LOGIN NODE ONLY) |
| 160 | +--- |
91 | 161 |
|
92 | | -Some packages use Rust/Cargo for dependencies. These need internet access to build. So we pre-fetch them here. |
| 162 | +## Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY) |
93 | 163 |
|
94 | | -We need to set `CARGO_HOME` to a location where the Cargo deps have been downloaded |
| 164 | +Rust packages frequently require network access during build. Pre-fetch their dependencies: |
95 | 165 |
|
96 | | -``` |
| 166 | +```bash |
97 | 167 | export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache |
98 | 168 | ../../util/fetch_cargo_deps.py |
99 | 169 | ``` |
100 | 170 |
|
101 | | -NOTE: `CARGO_HOME` should be set as well on the COMPUTE node! |
| 171 | +> ⚠️ **You must also set `CARGO_HOME` on compute nodes** before building. |
102 | 172 |
|
103 | | -## Install packages |
| 173 | +--- |
104 | 174 |
|
105 | | -Our install process will actually have (at least) three steps. This is because of the `crtm` package |
106 | | -which requires internet access at build time. |
| 175 | +## Install Packages |
107 | 176 |
|
108 | | -### Install Step 1: Dependencies of Rust codes and ecflow (COMPUTE NODE) |
| 177 | +Installation requires three stages: |
109 | 178 |
|
110 | | -We currently have some codes that use rust/cargo for dependencies. And, for some reason, |
111 | | -even doing the "cargo dependencies" as above, they still need internet |
112 | | -access to build/install. |
| 179 | +| Step | Node Type | Why | |
| 180 | +|------|-----------|-----| |
| 181 | +| Step 1 | Compute | Build dependencies in parallel, avoids CPU limits | |
| 182 | +| Step 2 | `afe` login | Needed for x86_64_v3 Python and internet access | |
| 183 | +| Step 3 | Compute | Finish main installation at high parallelism | |
113 | 184 |
|
114 | | -As for ecflow, we built QT on a login node (as it was the only complete node), so we |
115 | | -then have to build ecflow on a login node as well. |
| 185 | +--- |
116 | 186 |
|
117 | | -So we first install all the dependencies of then codes. |
| 187 | +### Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE) |
118 | 188 |
|
119 | | -``` |
| 189 | +```bash |
120 | 190 | export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache |
121 | | -spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature --only dependencies py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.deps-for-rust-and-ecflow |
| 191 | +spack install -j 16 --verbose --fail-fast --show-log-on-error \ |
| 192 | + --no-check-signature \ |
| 193 | + --only dependencies py-cryptography py-maturin py-rpds-py ecflow \ |
| 194 | + 2>&1 | tee log.install.deps-for-rust-and-ecflow ; bell |
122 | 195 | ``` |
123 | 196 |
|
124 | | -### Install Step 2: Rust Codes and ecflow (AFE LOGIN NODE) |
| 197 | +--- |
125 | 198 |
|
126 | | -NOTE: You *MUST* run this on an afe login node. The reason is the pfe login nodes are Sandy |
127 | | -Bridge but we are building Spack with `x86_64_v3` and these are too old (`_v2`). So |
128 | | -you will get an illegal instruction error when the install below calls python3. |
| 199 | +### Step 2 — Rust codes and ecFlow (AFE LOGIN NODE) |
129 | 200 |
|
130 | | -So go back to an afe login node and run: |
| 201 | +`pfe` nodes use Sandy Bridge CPUs, which **cannot run** spack-stack’s x86_64_v3 Python interpreter → results in `Illegal instruction`. |
131 | 202 |
|
132 | | -``` |
| 203 | +So this must be done on **afe**: |
| 204 | + |
| 205 | +```bash |
133 | 206 | export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache |
134 | | -spack install -j 2 -p 1 --verbose --fail-fast --show-log-on-error --no-check-signature py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.rust-and-ecflow |
| 207 | +spack install -j 2 -p 1 --verbose --fail-fast --show-log-on-error \ |
| 208 | + --no-check-signature \ |
| 209 | + py-cryptography py-maturin py-rpds-py ecflow \ |
| 210 | + 2>&1 | tee log.install.rust-and-ecflow ; bell |
135 | 211 | ``` |
136 | 212 |
|
137 | | -Note we are only using 2 processes here because NAS limits you to 2 processes on a login node. |
| 213 | +NAS limits login nodes to 2 processes, hence `-j 2`. |
138 | 214 |
|
139 | | -### Install Step 3: The rest (COMPUTE NODE) |
| 215 | +--- |
140 | 216 |
|
141 | | -``` |
| 217 | +### Step 3 — Remaining Packages (COMPUTE NODE) |
| 218 | + |
| 219 | +```bash |
142 | 220 | export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache |
143 | | -spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature 2>&1 | tee log.install.after-cargo |
| 221 | +spack install -j 16 --verbose --fail-fast --show-log-on-error \ |
| 222 | + --no-check-signature \ |
| 223 | + 2>&1 | tee log.install.after-cargo ; bell |
144 | 224 | ``` |
145 | 225 |
|
146 | | -NOTE: You might need to run the `spack install` command multiple times because sometimes |
147 | | -it just fails. But then you run it more and more and it will eventually succeed. |
| 226 | +> **Note:** You may need to re-run this command multiple times. Some builds fail intermittently but succeed on retry. |
148 | 227 |
|
149 | | -### Packages needing internet access to build |
| 228 | +--- |
150 | 229 |
|
151 | | -If you encounter other packages that need internet access to build, you can install them with: |
| 230 | +### Packages Requiring Internet |
152 | 231 |
|
153 | | -``` |
154 | | -spack install -j 2 --verbose --fail-fast --show-log-on-error --no-check-signature <package> |& tee log.install.<package> |
| 232 | +If you encounter another package that insists on network access: |
| 233 | + |
| 234 | +```bash |
| 235 | +spack install -j 2 --verbose --fail-fast --show-log-on-error \ |
| 236 | + --no-check-signature <package> \ |
| 237 | + |& tee log.install.<package> ; bell |
155 | 238 | ``` |
156 | 239 |
|
157 | | -Then, once that package is built, you can go back to the compute node and run the `spack install` command again. |
| 240 | +Once built, return to the compute node and resume the full installation. |
158 | 241 |
|
159 | | -## Update module files and setup meta-modules |
| 242 | +--- |
160 | 243 |
|
161 | | -``` |
162 | | -spack module tcl refresh -y --delete-tree |
| 244 | +## Update Module Files |
| 245 | + |
| 246 | +After installation completes: |
| 247 | + |
| 248 | +```bash |
| 249 | +spack module tcl refresh -y --delete-tree ; bell |
163 | 250 | spack stack setup-meta-modules |
164 | 251 | ``` |
165 | 252 |
|
166 | | -## Deactivate environment |
| 253 | +--- |
167 | 254 |
|
168 | | -``` |
| 255 | +## Deactivate the Environment |
| 256 | + |
| 257 | +```bash |
169 | 258 | spack env deactivate |
170 | 259 | ``` |
171 | 260 |
|
172 | | -# Debugging a package |
| 261 | +--- |
173 | 262 |
|
174 | | -When things go wrong, a good way to debug a failure is: |
| 263 | +## Debugging Package Builds |
175 | 264 |
|
176 | | -``` |
| 265 | +```bash |
177 | 266 | spack clean |
178 | 267 | spack stage <package> |
179 | 268 | spack build-env <package> -- bash --norc --noprofile |
180 | 269 | ``` |
| 270 | + |
| 271 | +This drops you into a clean build environment with the package’s full compiler/runtime environment loaded. |
| 272 | + |
| 273 | +--- |
| 274 | + |
| 275 | + |
0 commit comments