Skip to content

Commit 5857792

Browse files
committed
port ugent infra page to user docs
1 parent 92ee74d commit 5857792

File tree

2 files changed

+93
-1
lines changed

2 files changed

+93
-1
lines changed

mkdocs/docs/HPC/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Use the OS dropdown in the top bar to switch to a different operating system.
2020
{%- if site == 'Gent' %}
2121
- [Recording of HPC-UGent intro](https://www.ugent.be/hpc/en/training/introhpcugent-recording)
2222
- [Linux Tutorial](linux-tutorial/index.md)
23-
- [Hardware overview](https://www.ugent.be/hpc/en/infrastructure)
23+
- [Hardware overview](infrastructure.md)
2424
- [Available software](./only/gent/available_software/index.md)
2525
- [Migration of cluster and login nodes to RHEL9 (starting Sept'24)](rhel9.md)
2626
{%- endif %}

mkdocs/docs/HPC/infrastructure.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Infrastructure
2+
3+
## Tier2 clusters of Ghent University
4+
5+
The Stevin computing infrastructure consists of several Tier2 clusters
6+
which are hosted in the S10 datacenter of Ghent University.
7+
8+
This infrastructure is co-financed by FWO and Department of Economy,
9+
Science and Innovation (EWI).
10+
11+
## Tier-2 login nodes
12+
13+
Log in to the HPC-UGent Tier-2 infrastructure using SSH via `login.hpc.ugent.be`.
14+
15+
## Tier-2 compute clusters
16+
17+
### CPU clusters
18+
19+
The HPC-UGent Tier-2 infrastructure currently included several standard
20+
CPU-only clusters, of different generations (listed from old to new).
21+
22+
For basic information on using these clusters, see our
23+
[documentation](running_batch_jobs.md).
24+
25+
| ***cluster name*** | ***# nodes*** | ***Processor architecture*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
26+
| --- | --- | --- | --- | --- | --- | --- |
27+
| skitty | 72 | 2 x 18-core Intel Xeon Gold 6140 (Skylake @ 2.3 GHz) | 177 GiB | 1 TB & 240 GB SSD | EDR InfiniBand | RHEL 9 |
28+
| doduo (default cluster) | 128 | 2x 48-core AMD EPYC 7552 (Rome @ 2.2 GHz) | 250 GiB | 180GB SSD | HDR-100 InfiniBand | RHEL 8 |
29+
| gallade (*) | 16 | 2x 64-core AMD EPYC 7773X (Milan-X @ 2.2 GHz) | 940 GiB | 1.5 TB NVME | HDR-100 InfiniBand | RHEL 9 |
30+
| shinx | 48 | 2x 96-core AMD EPYC 9654 (Genoa @ 2.4 GHz) | 370 GiB | 500GB NVME | NDR-200 InfiniBand | RHEL 9 |
31+
32+
(*) also see this [extra information](donphan-gallade.md#gallade-large-memory-cluster)
33+
34+
### Interactive debug cluster
35+
36+
37+
A special-purpose interactive debug cluster is available,
38+
where you should always be able to get a job running quickly,
39+
**without waiting in the queue**.
40+
41+
Intended usage is mainly for interactive work,
42+
either via an interactive job or using the [HPC-UGent web portal](web_portal).
43+
44+
This cluster is heavily over-provisioned, so jobs may
45+
run slower if the cluster is used more heavily.
46+
47+
Strict limits are in place per user:
48+
* max. 5 jobs in queue
49+
* max. 3 jobs running
50+
* max. of 8 cores and 27GB of memory in total for running jobs
51+
52+
For more information, see our [documentation](interactive_gent).
53+
54+
| ***cluster name*** | ***# nodes*** | ***Processor architecture*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
55+
| --- | --- | --- | --- | --- | --- | --- |
56+
| donphan (*) | 16 | 2 x 18-core Intel Xeon Gold 6240 (Cascade Lake @ 2.6 GHz) + 1x shared NVIDIA Ampere A2 GPU (16GB GPU memory) | 738 GiB | 1.6 TB NVME | HDR-100 Infiniband | RHEL 8 |
57+
58+
(*) also see this [extra information](donphan-gallade.md#donphan-debuginteractive-cluster)
59+
60+
### GPU clusters
61+
62+
GPU clusters are available in the HPC-UGent Tier-2 infrastructure,
63+
with different generations of NVIDIA GPUs.
64+
65+
These are well suited for specific workloads, with software that
66+
can leverage the GPU resources (like TensorFlow, PyTorch, GROMACS, AlphaFold, etc.).
67+
68+
For more information on using these clusters, see our documentation.
69+
70+
| ***cluster name*** | ***# nodes*** | ***Processor architecture & GPUs*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
71+
| --- | --- | --- | --- | --- | --- | --- |
72+
| joltik | 10 | 2x 16-core Intel Xeon Gold 6242 (Cascade Lake @ 2.8 GHz) + 4x NVIDIA Volta V100 GPUs (32GB GPU memory) | 256 GiB | 800GB SSD | double EDR Infiniband | RHEL 9 |
73+
| accelgor | 9 | 2x 24-core AMD EPYC 7413 (Milan @ 2.2 GHz) + 4x NVIDIA Ampere A100 GPUs (80GB GPU memory) | 500 GiB | 180GB SSD | HDR InfiniBand | RHEL 8 |
74+
75+
76+
## Tier-2 shared storage
77+
78+
| ***Filesystem name*** | ***Intended usage*** | ***Total storage space*** | ***Personal storage space*** | ***VO storage space (^)*** |
79+
| ---| --- |---| --- | --- |
80+
| $VSC_HOME | Home directory, entry point to the system | 90 TB | 3GB (fixed) | (none) |
81+
| $VSC_DATA | Long-term storage of large data files | 1.9 PB | 25GB (fixed) | 250GB |
82+
| $VSC_SCRATCH | Temporary fast storage of 'live' data for calculations | 1.7 PB | 25GB (fixed) | 250GB |
83+
| $VSC_SCRATCH_ARCANINE | Temporary very fast storage of 'live' data for calculations (recommended for very I/O-intensive jobs) | 70 TB NVME | (none) | upon request |
84+
85+
86+
^ Storage space for a group of users (Virtual Organisation or VO for short) can be
87+
increased significantly on request. For more information, see our
88+
[documentation](running_jobs_with_input_output_data.md#virtual-organisations).
89+
90+
## Infrastructure status
91+
92+
[Check the system status](https://www.ugent.be/hpc/en/infrastructure/status)

0 commit comments

Comments
 (0)