Edit modules and running

hyandt · hyandt · commit dd482f4250e6 · 2026-01-13T14:35:40.000-07:00
diff --git a/docs/Documentation/Systems/Gila/index.md b/docs/Documentation/Systems/Gila/index.md
@@ -1,7 +1,7 @@
 
 # About Gila
 
-Gila is an OpenHPC-based cluster running on __Dual AMD EPYC 7532 Rome CPUs__ and __Intel Xeon Icelake CPUs with NVIDIA A100 GPUs__. The nodes run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NLR workloads and intended for LDRD, SPP or Office of Science workloads. Check back regularly as the configuration and capabilities for Gila are augmented over time.
+Gila is an OpenHPC-based cluster. The [nodes](./running.md#gila-compute-nodes) run as virtual machines in a local virtual private cloud (OpenStack). Gila is allocated for NLR workloads and intended for LDRD, SPP or Office of Science workloads. Check back regularly as the configuration and capabilities for Gila are augmented over time.
 
 
 ## Gila Access and Allocations
diff --git a/docs/Documentation/Systems/Gila/modules.md b/docs/Documentation/Systems/Gila/modules.md
@@ -1,7 +1,7 @@
 # Modules on Gila
 
 On Gila, modules are deployed and organized slightly differently than on other NLR HPC systems. 
-While the basic concepts of using modules remain the same, there are important differences in how modules are structured, discovered, and loaded. These differences are intentional and are designed to improve compatibility, reproducibility, and long-term maintainability. The upcoming sections of this document will walk through these differences step by step.
+While the basic concepts of using modules remain the same, there are important differences in how modules are structured, discovered, and loaded. These differences are intentional and designed to improve compatibility, reproducibility, and long-term maintainability. The upcoming sections of this document will walk through these differences step by step.
 
 The module system used on this cluster is [Lmod](../../Environment/lmod.md).
 
@@ -14,15 +14,19 @@ When you log in to Gila, three modules are loaded automatically by default:
 !!! note
     The `DefApps` module is a convenience module that ensures both `Core` and `GCC` are loaded upon login or when you use `module restore`. It does not load additional software itself but guarantees that the essential environment is active.
 
-## X86 VS ARM 
+## x86 vs ARM 
 
-There are two module stacks on Gila, one for each hardware architecture and each stack is loaded depending on the hardware used. 
-The two hardware stacks are almost identical in terms of modules offered, however some modules might be missing and/or have different versions. Please email [HPC-Help](mailto:HPC-Help@nrel.gov) for any request regarding modules availability and/or versions change.
-The recommended usage is to connect to the login node corresponding to the hardware intended to be used for the compute, e.g. `gila-login-1` for **x86** and `gila-hopper-login1` for **arm**.
+Gila has two separate module stacks, one for each hardware architecture. The appropriate stack is automatically loaded based on which login node you use.
+The two hardware stacks are almost identical in terms of available modules. However, some modules might be missing or have different versions depending on the architecture. For requests regarding module availability or version changes, please email [HPC-Help](mailto:HPC-Help@nrel.gov).
+
+To ensure proper module compatibility, connect to the login node corresponding to your target compute architecture:
+
+- **x86 architecture**: Use `gila-login-1` 
+- **ARM architecture**: Use `gila-hopper-login1` (Grace Hopper nodes)
 
-!!! warning
-    Usage of the GraceHopper computes from the x86 login node, or the usage of x86 computes from the GraceHopper login is not allowed and will cause module problems. 
 
+!!! warning
+    Do not submit jobs to Grace Hopper (ARM) compute nodes from the x86 login node, or vice versa.
 
 ## Module Structure on Gila
 
@@ -69,18 +73,64 @@ This separation between Base and Core modules ensures:
 * Reduced risk of mixing incompatible software
 * A cleaner and more predictable module environment
 
+## Module Commands: restore, avail, and spider
+
+### module restore
+
+The `module restore` command reloads the set of modules that were active at the start of your login session or at the last checkpoint. This is useful if you have unloaded or swapped modules and want to return to your original environment.
+
+Example:
+
+```bash
+module restore
+```
+
+This will restore the default modules that were loaded at login, such as `Core/25.05`, `DefApps`, and `gcc/14.2.0`.
+
+### module avail
+
+The `module avail` command lists all modules that are **currently visible** in your environment. This includes modules that are compatible with the loaded compiler, MPI, or CUDA base modules.
+
+Example:
+
+```bash
+module avail
+```
+
+You can also search for a specific software:
+
+```bash
+module avail python
+```
+
+### module spider
+
+The `module spider` command provides a **complete listing of all versions and configurations** of a software package, including those that are **not currently visible** with `module avail`. It also shows **which modules need to be loaded** to make a specific software configuration available.
+
+Example:
+
+```bash
+module spider python/3.10
+```
+
+This output will indicate any prerequisite modules you need to load before the software becomes available.
+
+!!! tip
+    Use `module avail` for quick checks and `module spider` when you need full details or to resolve dependencies for specific versions.
 
 ## MPI-Enabled Software
 
 MPI-enabled software modules are identified by a `-mpi` suffix at the end of the module name.
 
 Similar to compiler modules, MPI-enabled software is **not visible by default**. These modules only appear after an MPI implementation is loaded. Supported MPI implementations include `openmpi`, `mpich`, and `intelmpi`.
 
-Loading an MPI implementation makes MPI-enabled software that was installed with that specific MPI stack available when running `module avail`.
+Loading an MPI implementation makes MPI-enabled software built with that specific MPI stack available when running `module avail`.
 
 This behavior ensures that only software built against the selected MPI implementation is exposed, helping users avoid mixing incompatible MPI libraries.
 
-For example, using **module spider** to find all available variances of **HDF5**.
+### Example: Finding and Loading MPI-Enabled HDF5
+
+Use `module spider` to find all available variants of **HDF5**.
 
 ```bash
 [USER@gila-login-1 ~]$ ml spider hdf5
@@ -91,10 +141,10 @@ For example, using **module spider** to find all available variances of **HDF5**
       hdf5/1.14.5-mpi 
 ```
 
-Each version of **HDF5** requires dependency modules to be loaded so that they can be available to be used. 
-Please refer to the **module spider** section for more details.
+Each version of **HDF5** requires dependency modules to be loaded before it becomes available. 
+Please refer to the [module spider section](modules.md#module-spider) for more details.
 
-To find the dependencies needed for **hdf5/1.14.5-mpi** 
+To find the dependencies needed for `hdf5/1.14.5-mpi`:
 
 ```bash
 [USER@gila-login-1 ~]$ ml spider hdf5/1.14.5-mpi
@@ -107,19 +157,20 @@ To find the dependencies needed for **hdf5/1.14.5-mpi**
       oneapi/2025.1.3  openmpi/5.0.5
 ```
 
-Without the dependencies and using **ml avail** 
+Before loading the dependencies:
 
 ```bash
 [USER@gila-login-1 ~]$ ml avail hdf5
 --------------- [ gcc/14.2.0 ] -------------
   hdf5/1.14.5
 ```
 
-This version of **HDF5** is not *mpi* enabled. 
+This version of **HDF5** is not MPI-enabled. 
 
-Now with the dependencies loaded 
+After loading the dependencies, both versions are now visible: 
 
 ```bash
+[USER@gila-login-1 ~]$ ml gcc/14.2.0 openmpi/5.0.5
 [USER@gila-login-1 ~]$ ml avail hdf5
 --------------- [ gcc/14.2.0, openmpi/5.0.5 ] -------------
   hdf5/1.14.5-mpi
@@ -128,7 +179,7 @@ Now with the dependencies loaded
 ```
 
 
-!!! note
+!!! tip
     To determine whether a software package is available on the cluster, use `module spider`. This command lists **all available versions and configurations** of a given software, including those that are not currently visible with `module avail`.
     
     To find out which modules must be loaded in order to access a specific software configuration, run `module spider` using the **full module name**. This will show the required modules that need to be loaded to make that software available.
@@ -144,7 +195,7 @@ This means you can use Apptainer and Podman at any time without loading a specif
 ## Building on Gila
 
 Building on Gila should be done on compute nodes and **NOT** login nodes. 
-Some important build tools are not available by default and requires loading them from the module stack. 
+Some important build tools are not available by default and require loading them from the module stack. 
 
 These build tools are: 
 
@@ -154,51 +205,7 @@ These build tools are:
 - automake
 - m4
 
-
-## Module Commands: restore, avail, and spider
-
-### module restore
-
-The `module restore` command reloads the set of modules that were active at the start of your login session or at the last checkpoint. This is useful if you have unloaded or swapped modules and want to return to your original environment.
-
-Example:
-
-```bash
-module restore
-```
-
-This will restore the default modules that were loaded at login, such as `Core/25.05`, `DefApps`, and `gcc/14.2.0`.
-
-### module avail
-
-The `module avail` command lists all modules that are **currently visible** in your environment. This includes modules that are compatible with the loaded compiler, MPI, or CUDA base modules.
-
-Example:
-
-```bash
-module avail
-```
-
-You can also search for a specific software:
-
-```bash
-module avail python
-```
-
-### module spider
-
-The `module spider` command provides a **complete listing of all versions and configurations** of a software package, including those that are **not currently visible** with `module avail`. It also shows **which modules need to be loaded** to make a specific software configuration available.
-
-Example:
-
-```bash
-module spider python/3.10
-```
-
-This output will indicate any prerequisite modules you need to load before the software becomes available.
-
-!!! tip
-    Use `module avail` for quick checks and `module spider` when you need full details or to resolve dependencies for specific versions.
+Please see [here](./running.md#example-compiling-a-program-on-gila) for a full example of compiling a program on Gila. 
 
 
 ## Frequently Asked Questions
@@ -213,4 +220,4 @@ This output will indicate any prerequisite modules you need to load before the s
     While it is technically possible, Miniforge is intended to provide an isolated environment separate from external modules. Be careful with the order in which modules are loaded, as this can impact your `PATH` and `LD_LIBRARY_PATH`.
 
 ??? note "What if I want a different CUDA version?"
-    Other CUDA versions are available under **CORE** modules. If you need additional versions, please reach out to [HPC-Help](mailto:HPC-Help@nrel.gov). Note that CUDA modules under CORE do **not** automatically make CUDA-enabled software available; only CUDA modules under **Base** modules will load CUDA-enabled packages.
+    Other CUDA versions are available under **Core** modules. If you need additional versions, please reach out to [HPC-Help](mailto:HPC-Help@nrel.gov). Note that CUDA modules under CORE do **not** automatically make CUDA-enabled software available; only CUDA modules under **Base** modules will load CUDA-enabled packages.
diff --git a/docs/Documentation/Systems/Gila/running.md b/docs/Documentation/Systems/Gila/running.md
@@ -3,17 +3,25 @@
 *Learn about compute nodes and job partitions on Gila.*
 
 
-## Compute Nodes
+## Gila Compute Nodes
 
-Compute nodes in Gila are virtualized nodes. **These nodes are not configured as exclusive and can be shared by multiple users or jobs.** Be sure to request the resources that your job needs, including memory and cores.
+Gila compute nodes are not configured as exclusive and can be shared by multiple users or jobs. Be sure to request the resources that your job needs, including memory and cores. If you need exclusive use of a node, add the `--exclusive` flag to your job submission. 
 
+### CPU Nodes
 
-## GPU hosts
+The CPU nodes in Gila are single-threaded virtualized nodes.  There are two sockets and NUMA nodes per compute node, with each socket containing 30 __AMD EPYC Milan__ (x86-64) cores. Each node has 220GB of RAM that can be used. 
 
-GPU nodes in Gila have NVIDIA A100 GPUs running on __Intel Xeon Icelake CPUs__.
 
+### GPU Nodes
+
+GPU nodes in Gila have 8 NVIDIA A100 GPUs running on x86-64 __Intel Xeon Icelake CPUs__. There are 42 cores on a GPU node, with one socket and NUMA node. Each GPU node has 910GB of RAM, and each NVIDIA A100 GPU has 80GB of VRAM.
+
+### Grace Hopper Nodes
+
+Gila has 6 NVIDIA Grace Hopper nodes. To use the Grace Hopper nodes, submit your jobs to the `gh` partition from the `gila-hopper-login1.hpc.nrel.gov` login node. Each Grace Hopper node has a 72 core NVIDIA Grace CPU and an NVIDIA GH200 GPU, with 96GB of VRAM and 470GB of RAM. They have one socket and NUMA node. 
+
+Please note - the __NVIDIA Grace CPUs__ run on a different processing architecture (ARM64) than both the __Intel Xeon Icelake CPUs__ (x86-64) and the __AMD EPYC Milan__ (x86-64). Any application that is manually compiled by a user and intended to be used on the Grace Hopper nodes __MUST__ be compiled on the Grace Hopper nodes themselves. 
 
-There are also 5 NVIDIA Grace Hopper nodes. To use the Grace Hopper nodes, submit your jobs to the gh partition from the `gila-hopper-login1.hpc.nrel.gov` login node. 
 
 
 ## Partitions
@@ -23,13 +31,128 @@ A list of partitions can be found by running the `sinfo` command.  Here are the
 | Partition Name                          | CPU |  GPU | Qty | RAM    | Cores/node |
 | :--:                                    | :--:| :--: | :--:| :--:   | :--:       |                        
 | gpu       |  Intel Xeon Icelake | NVIDIA Tesla A100-80 |  1  | 910 GB |   42            |      
-| amd                                | 2x 30 Core AMD Epyc Milan |  |  36  | 220 GB |   60            |
+| amd                                | 2x 30 Core AMD Epyc Milan | N/A |  36  | 220 GB |   60            |
 | gh                                | NVIDIA Grace | GH200 |  5  | 470 GB |       72       |
 
 
 ## Performance Recommendations
 
-Gila is optmized for single-node workloads. Multi-node jobs may experience degraded performance. 
+Gila is optimized for single-node workloads. Multi-node jobs may experience degraded performance. All MPI distribution flavors work on Gila, with noted performance from Intel-MPI. Gila is single-threaded, and applications that are compiled to make use of multiple threads will not be able to take advantage of this. 
+
+## Example: Compiling a Program on Gila
+
+In this section we will describe how to compile an MPI based application using an Intel toolchain from the module system. Please see the [Modules page](./modules.md) for additional information on the Gila module system.
+
+
+### Requesting an interactive session
+First, we will begin by requesting an interactive session. This will give us a compute node from where we can carry out our work. An example command for requesting such a session is as follows:
+
+```salloc -N 1 -n 60 --mem 60GB --partition=amd --account=aurorahpc --time=01:00:00```
+
+This will request a single node from the AMD partition with 60 cores and 60 GB of memory for one hour. We request this node using the ```aurorahpc``` account that is open to all NLR staff, but if you have an HPC allocation, please replace ```aurorahpc``` with the project handle.
+
+### Loading necessary modules
+
+Once we have an allocated node, we will need to load the initial Intel module for the toolchain `oneapi`. This will give us access to the Intel toolchain, and we will we now load the module ```intel-oneapi-mpi``` to give us access to Intel MPI. Please note, you can always check what modules are available to you by using the command ```module avail``` and you can also check what modules you have loaded by using the command ```module list```. The commands for loading the modules that we need are as follows:
+
+```bash
+module load oneapi
+module load intel-oneapi-mpi
+```
+
+### Copying program files
+
+We now have access to the tools we need from the Intel toolchain in order to be able to compile a program! First, create a directory called `program-compilation` under `/projects` or `/scratch`. 
+
+```bash
+mkdir program-compilation
+cd program-compilation 
+```
+Now we are going to copy the `phostone.c` file from `/nopt/nrel/apps/210929a` to our `program-compilation` directory. 
+
+```rsync -avP /nopt/nrel/apps/210929a/example/phostone.c .```
+
+`rsync` is a copy command that is commonly used for transferring files, and the parameters that we put into the command allow for us to see the progress of the file transfer and preserve important file characteristics. 
+
+### Program compilation
+
+Once the file is copied, we can compile the program. The command we need to use in order to compile the program is as follows:
+
+```bash
+mpiicx -qopenmp phostone.c -o phost.intelmpi
+```
+
+The command ```mpiicx``` is the Intel MPI compiler that was loaded from the module ```intel-oneapi-mpi```, and we added the flag of ```-qopenmp``` to make sure that the OpenMP compiled portions of the program are able to be loaded. We then specified the file name as `phost.intelmpi` using the ```-o``` flag. 
+
+### Submitting a job
+
+The following batch script requests two cores to use two MPI ranks on a single node, with a run time of up to an hour. Save this script to a file such as `submit_intel.sh`, and submit using `sbatch submit_intel.sh`. Again, if you have an HPC allocation, we request that you replace ```aurorahpc``` with the project handle.
+
+??? example "Batch Submission Script - Intel MPI"
+
+    ```bash
+    #!/bin/bash
+    #SBATCH --nodes=1
+    #SBATCH --ntasks=2
+    #SBATCH --cpus-per-task=2
+    #SBATCH --time=00:01:00
+    #SBATCH --mem=20GB
+    #SBATCH --account=aurorahpc
+
+    module load oneapi
+    module load intel-oneapi-mpi
+
+    srun --cpus-per-task 2 -n 2 ./phost.intelmpi -F
+    ```
+
+Your output should look similar to the following
+
+```
+MPI VERSION Intel(R) MPI Library 2021.14 for Linux* OS
+task    thread             node name  first task    # on node  core
+0000      0000    gila-compute-36.novalocal        0000         0000  0001
+0000      0001    gila-compute-36.novalocal        0000         0000  0000
+0001      0000    gila-compute-36.novalocal        0000         0001  0031
+0001      0001    gila-compute-36.novalocal        0000         0001  0030
+```
+
+
+### Compiling with OpenMPI
+
+We can now follow these steps using OpenMPI as well! First, we will unload the Intel modules from the Intel toolchain. We will then load GNU modules and OpenMPI using the `module load` command from earlier. The commands are as follows:
+
+```bash
+module unload intel-oneapi-mpi
+module unload oneapi
+module load gcc
+module load openmpi
+```
+
+We can then compile the phost program again by using the following commands:
+
+```bash
+mpicc -fopenmp phostone.c -o phost.openmpi
+```
+
+Once the program has been compiled against OpenMPI, we can go ahead and submit another batch script to test the program:
+
+
+??? example "Batch Submission Script - OpenMPI"
+
+    ```bash
+    #!/bin/bash
+    #SBATCH --nodes=1
+    #SBATCH --ntasks=2
+    #SBATCH --cpus-per-task=2
+    #SBATCH --time=00:01:00
+    #SBATCH --mem=20GB
+    #SBATCH --account=aurorahpc
+
+    module load gcc
+    module load openmpi
+
+    srun --cpus-per-task 2 -n 2 ./phost.openmpi -F
+    ```