Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 41 additions & 13 deletions docs/alps/hardware.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@

Alps is a HPE Cray EX3000 system, a liquid cooled blade-based, high-density system.

!!! todo
this is a skeleton - all of the details need to be filled in
!!! under-construction
This page is a work in progress - contact us if you want us to prioritise documentation specific information that would be useful for your work.

## Alps Cabinets

Expand Down Expand Up @@ -40,13 +40,13 @@ Alps was installed in phases, starting with the installation of 1024 AMD Rome du

There are currently five node types in Alps:

| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
| ---- | ------- | ------:| -----:| -----------:| -----------:|
| NVIDIA GH200 | gh200 | 1344 | 2688 | 10,752 | 10,752 |
| AMD Rome | zen2 | 256 | 1024 | 2,048 | -- |
| NVIDIA A100 | a100 | 72 | 144 | 144 | 576 |
| AMD MI250x | mi200 | 12 | 24 | 24 | 96 |
| AMD MI300A | mi300 | 64 | 128 | 512 | 512 |
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
| ---- | ------- | ------:| -----:| -----------:| -----------:|
| [NVIDIA GH200][ref-alps-gh200-node] | gh200 | 1344 | 2688 | 10,752 | 10,752 |
| [AMD Rome][ref-alps-zen2-node] | zen2 | 256 | 1024 | 2,048 | -- |
| [NVIDIA A100][ref-alps-a100-node] | a100 | 72 | 144 | 144 | 576 |
| [AMD MI250x][ref-alps-mi200-node] | mi200 | 12 | 24 | 24 | 96 |
| [AMD MI300A][ref-alps-mi300-node] | mi300 | 64 | 128 | 512 | 512 |

[](){#ref-alps-gh200-node}
### NVIDIA GH200 GPU Nodes
Expand Down Expand Up @@ -81,16 +81,44 @@ Each node contains four Grace-Hopper modules and four corresponding network inte
[](){#ref-alps-zen2-node}
### AMD Rome CPU Nodes

!!! todo
These nodes have two [AMD Epyc 7742](https://en.wikichip.org/wiki/amd/epyc/7742) 64-core CPU sockets, and are used primarily for the [Eiger][ref-cluster-eiger] system. They come in two memory configurations:

* *Standard-memory*: 256 GB in 16x16 GB DDR4 DIMMs.
* *Large-memory*: 512 GB in 16x32 GB DDR4 DIMMs.

!!! note "Not all memory is available"
The total memory available to jobs on the nodes is roughly 245 GB and 497 GB on the standard and large memory nodes respectively.

The amount of memory available to your job also depends on the number of MPI ranks per node -- each MPI rank has a memory overhead.

A schematic of a *standard memory node* below illustrates the CPU cores and [NUMA nodes](https://www.kernel.org/doc/html/v4.18/vm/numa.html).(1)
{.annotate}

EX425
1. Obtained with the command `lstopo --no-caches --no-io --no-legend eiger-topo.png` on Eiger.

![Screenshot](../images/slurm/eiger-topo.png)

* The two sockets are labelled Package L#0 and Package L#1.
* Each socket has 4 NUMA nodes, with 16 cores each, for a total of 64 cores per socket.

Each core supports [simultaneous multi threading (SMT)](https://www.amd.com/en/blogs/2025/simultaneous-multithreading-driving-performance-a.html), whereby each core can execute two threads concurrently, which are presented as two processing units (PU) per physical core:

* the first PU on each core are numbered 0:63 on socket 0, and 64:127 on socket 1;
* the second PU on each core are numbered 128:191 on socket 0, and 192:256 on socket 1;
* hence, core `n` has PUs `n` and `n+128`.

Each node has two Slingshot 11 network interface cards (NICs), which are not illustrated on the diagram.

[](){#ref-alps-a100-node}
### NVIDIA A100 GPU Nodes

!!! todo
The Grizzly Peak blades contain two nodes, where each node has:

Grizzly Peak
* One 64-core Zen3 CPU socket
* 512 GB DDR4 Memory
* 4 NVIDIA A100 GPUs with 80 GB HBM3 memory each
* The MCH system is the same, except the A100 have 96 GB of memory.
* 4 NICs -- one per GPU.

[](){#ref-alps-mi200-node}
### AMD MI250x GPU Nodes
Expand Down
Binary file added docs/images/slurm/eiger-topo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading