Merge pull request #764 from wdpypere/infra_page

itkovian · web-flow · commit 43c569de0e2d · 2024-11-15T16:18:06.000+01:00
port ugent infra page to user docs
diff --git a/mkdocs/docs/HPC/index.md b/mkdocs/docs/HPC/index.md
@@ -20,7 +20,7 @@ Use the OS dropdown in the top bar to switch to a different operating system.
 {%- if site == 'Gent' %}
 - [Recording of HPC-UGent intro](https://www.ugent.be/hpc/en/training/introhpcugent-recording)
 - [Linux Tutorial](linux-tutorial/index.md)
-- [Hardware overview](https://www.ugent.be/hpc/en/infrastructure)
+- [Hardware overview](infrastructure.md)
 - [Available software](./only/gent/available_software/index.md)
 - [Migration of cluster and login nodes to RHEL9 (starting Sept'24)](rhel9.md)
 {%- endif %}
diff --git a/mkdocs/docs/HPC/infrastructure.md b/mkdocs/docs/HPC/infrastructure.md
@@ -0,0 +1,96 @@
+# Infrastructure
+
+## Tier2 clusters of Ghent University
+
+The Stevin computing infrastructure consists of several Tier2 clusters
+which are hosted in the S10 datacenter of Ghent University.
+
+This infrastructure is co-financed by FWO and Department of Economy,
+Science and Innovation (EWI).
+
+## Tier-2 login nodes
+
+Log in to the HPC-UGent Tier-2 infrastructure via [https://login.hpc.ugent.be](https://login.hpc.ugent.be)
+or using SSH via `login.hpc.ugent.be`.
+
+more info on using the web portal you can find [here](web_portal),
+and about connection with SSH [here](connecting).
+
+## Tier-2 compute clusters
+
+### CPU clusters
+
+The HPC-UGent Tier-2 infrastructure currently included several standard
+CPU-only clusters, of different generations (listed from old to new).
+
+For basic information on using these clusters, see our
+[documentation](running_batch_jobs.md).
+
+| ***cluster name*** | ***# nodes*** | ***Processor architecture*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
+| --- | --- | --- | --- | --- | --- | --- |
+| skitty | 72 | 2 x 18-core Intel Xeon Gold 6140 (Skylake @ 2.3 GHz) | 177 GiB | 1 TB & 240 GB SSD | EDR InfiniBand | RHEL 9 |
+| doduo (default cluster) | 128 | 2x 48-core AMD EPYC 7552 (Rome @ 2.2 GHz) | 250 GiB | 180GB SSD | HDR-100 InfiniBand | RHEL 8 |
+| gallade (*) | 16 | 2x 64-core AMD EPYC 7773X (Milan-X @ 2.2 GHz) | 940 GiB | 1.5 TB NVME | HDR-100 InfiniBand | RHEL 9 |
+| shinx | 48 | 2x 96-core AMD EPYC 9654 (Genoa @ 2.4 GHz) | 370 GiB | 500GB NVME | NDR-200 InfiniBand | RHEL 9 |
+
+(*) also see this [extra information](./only/gent/2023/donphan-gallade#gallade-large-memory-cluster)
+
+### Interactive debug cluster
+
+
+A special-purpose interactive debug cluster is available,
+where you should always be able to get a job running quickly,
+**without waiting in the queue**.
+
+Intended usage is mainly for interactive work,
+either via an interactive job or using the [HPC-UGent web portal](web_portal).
+
+This cluster is heavily over-provisioned, so jobs may
+run slower if the cluster is used more heavily.
+
+Strict limits are in place per user:
+ * max. 5 jobs in queue
+ * max. 3 jobs running
+ * max. of 8 cores and 27GB of memory in total for running jobs
+
+For more information, see our [documentation](interactive_gent).
+
+| ***cluster name*** | ***# nodes*** | ***Processor architecture*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
+| --- | --- | --- | --- | --- | --- | --- |
+| donphan (*) | 16 | 2 x 18-core Intel Xeon Gold 6240 (Cascade Lake @ 2.6 GHz) + 1x shared NVIDIA Ampere A2 GPU (16GB GPU memory) | 738 GiB | 	1.6 TB NVME | HDR-100 Infiniband | RHEL 8 |
+
+(*) also see this [extra information](./only/gent/2023/donphan-gallade#donphan-debuginteractive-cluster)
+
+### GPU clusters
+
+GPU clusters are available in the HPC-UGent Tier-2 infrastructure,
+with different generations of NVIDIA GPUs.
+
+These are well suited for specific workloads, with software that
+can leverage the GPU resources (like TensorFlow, PyTorch, GROMACS, AlphaFold, etc.).
+
+For more information on using these clusters, see our documentation.
+
+| ***cluster name*** | ***# nodes*** | ***Processor architecture & GPUs*** | ***Usable memory/node*** | ***Local diskspace/node*** | ***Interconnect*** | ***Operating system*** |
+| --- | --- | --- | --- | --- | --- | --- |
+| joltik | 10 | 2x 16-core Intel Xeon Gold 6242 (Cascade Lake @ 2.8 GHz) + 4x NVIDIA Volta V100 GPUs (32GB GPU memory) | 256 GiB | 800GB SSD | double EDR Infiniband | RHEL 9 |
+| accelgor | 9 | 2x 24-core AMD EPYC 7413 (Milan @ 2.2 GHz) + 4x NVIDIA Ampere A100 GPUs (80GB GPU memory) | 500 GiB | 180GB SSD | HDR InfiniBand | RHEL 8 |
+
+
+## Tier-2 shared storage
+
+| ***Filesystem name*** | ***Intended usage*** | ***Total storage space*** | ***Personal storage space*** | ***VO storage space (^)*** |
+| ---| --- |---| --- | --- |
+| $VSC_HOME | Home directory, entry point to the system | 90 TB | 3GB (fixed) | (none) |
+| $VSC_DATA | Long-term storage of large data files | 1.9 PB | 25GB (fixed) |  250GB |
+| $VSC_SCRATCH | Temporary fast storage of 'live' data for calculations | 1.7 PB | 25GB (fixed) | 250GB |
+| $VSC_SCRATCH_ARCANINE | Temporary very fast storage of 'live' data for calculations (recommended for very I/O-intensive jobs) | 70 TB NVME | (none) 	| upon request |
+
+
+^ Storage space for a group of users (Virtual Organisation or VO for short) can be
+increased significantly on request. For more information, see our
+[documentation](running_jobs_with_input_output_data#virtual-organisations).
+
+## Infrastructure status
+
+[Check the system status](https://www.ugent.be/hpc/en/infrastructure/status)