Skip to content

Commit 998eefd

Browse files
committed
fix links; add more links to ML tutorials/pytorch; add guides/tutorials section to landing page
1 parent cfe4fe8 commit 998eefd

File tree

7 files changed

+52
-30
lines changed

7 files changed

+52
-30
lines changed

docs/access/jupyterlab.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ Examples of notebooks with `ipcmagic` can be found [here](https://github.com/
199199

200200
While it is generally recommended to submit long-running machine learning training and inference jobs via `sbatch`, certain use cases can benefit from an interactive Jupyter environment.
201201

202-
A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-software-ml-tutorials].
202+
A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-tutorials-ml].
203203
In particular, the `accelerate launch` script in the [LLM fine-tuning tutorial][software-ml-llm-fine-tuning-tutorial] can be directly carried over to a Jupyter cell with a `%%bash` header (to run its contents interpreted by bash).
204204
For `torchrun`, one can adapt the command from the multi-node [nanotron tutorial][software-ml-llm-nanotron-tutorial] to run on a single GH200 node using the following line in a Jupyter cell
205205

docs/index.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,3 @@
1-
!!! info ""
2-
This is the new CSCS documentation site, which replaces the [CSCS Knowledge Base](https://confluence.cscs.ch/display/KB).
3-
4-
The migration of old documentation is still not fully complete.
5-
If you find documentation that is missing, please create a ticket on the documentation's [GitHub issue tracker](https://github.com/eth-cscs/cscs-docs/issues).
6-
71
# CSCS Documentation
82

93
<div class="grid cards" markdown>
@@ -66,32 +60,26 @@ The Alps Research infrastructure hosts multiple platforms and clusters targeting
6660

6761
</div>
6862

69-
[](){#ref-get-in-touch}
70-
## Get in Touch
63+
## Tutorials and Guides
7164

72-
If you cannot find the information that you need in the documentation, help is available.
65+
Learn by doing with our guides and tutorials.
7366

7467
<div class="grid cards" markdown>
68+
- :fontawesome-solid-layer-group: __Tutorials__
7569

76-
- :fontawesome-solid-headset: __Get Help__
77-
78-
Contact the CSCS Service Desk for help.
79-
80-
[:octicons-arrow-right-24: Service Desk](https://jira.cscs.ch/plugins/servlet/desk)
70+
Hands on tutorials that show how to implement workflows on Alps.
8171

82-
- :fontawesome-regular-comments: __Chat__
72+
[:octicons-arrow-right-24: Machine Learning][ref-tutorials-ml]
8373

84-
Discuss Alps with other users and CSCS staff on Slack.
74+
- :fontawesome-solid-mountain-sun: __Guides__
8575

86-
[:octicons-arrow-right-24: CSCS User Slack](https://cscs-users.slack.com/)
76+
Guides with practical advice, hints and tips for key topics.
8777

88-
<div class="grid cards" markdown>
89-
- :fontawesome-solid-hammer: __Contribute__
78+
[:octicons-arrow-right-24: Using storage effectively][ref-guides-storage]
9079

91-
The source for the documentation is hosted on GitHub.
80+
[:octicons-arrow-right-24: Accessing internet and external services][ref-guides-internet-access]
9281

93-
[:octicons-arrow-right-24: Contribute to the docs ](contributing/index.md)
94-
</div>
82+
[:octicons-arrow-right-24: Using and configuring the terminal][ref-guides-terminal]
9583

9684
</div>
9785

@@ -142,3 +130,32 @@ If you cannot find the information that you need in the documentation, help is a
142130

143131
</div>
144132

133+
[](){#ref-get-in-touch}
134+
## Get in Touch
135+
136+
If you cannot find the information that you need in the documentation, help is available.
137+
138+
<div class="grid cards" markdown>
139+
140+
- :fontawesome-solid-headset: __Get Help__
141+
142+
Contact the CSCS Service Desk for help.
143+
144+
[:octicons-arrow-right-24: Service Desk](https://jira.cscs.ch/plugins/servlet/desk)
145+
146+
- :fontawesome-regular-comments: __Chat__
147+
148+
Discuss Alps with other users and CSCS staff on Slack.
149+
150+
[:octicons-arrow-right-24: CSCS User Slack](https://cscs-users.slack.com/)
151+
152+
<div class="grid cards" markdown>
153+
- :fontawesome-solid-hammer: __Contribute__
154+
155+
The source for the documentation is hosted on GitHub.
156+
157+
[:octicons-arrow-right-24: Contribute to the docs ](contributing/index.md)
158+
</div>
159+
160+
</div>
161+

docs/platforms/mlp/index.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@
44
The Machine Learning Platform (MLP) provides compute, storage and expertise to the machine learning and AI community in Switzerland, with the main user being the [Swiss AI Initiative](https://www.swiss-ai.org/).
55

66
<div class="grid cards" markdown>
7-
- :fontawesome-solid-mountain: [__Tutorials__][ref-software-ml-tutorials]
7+
- :fontawesome-solid-mountain: [__Tutorials__][ref-tutorials-ml]
88

9-
Tutorials on how to set up and configure a machine learning environment in order to run LLM workloads such as inference, fine-tuning and multi-node training can be found in the [tutorials section][ref-software-ml-tutorials].
9+
Tutorials on how to set up and configure a machine learning environment in order to run LLM workloads such as inference, fine-tuning and multi-node training can be found in the [tutorials section][ref-tutorials-ml].
10+
11+
Also check out the [PyTorch documentation][ref-software-ml-pytorch] for information about how to run PyTorch.
1012

1113
</div>
1214

docs/software/ml/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Most ML workloads are containerized to ensure portability, reproducibility, and
66

77
Users can choose between running containers, using provided uenv software stacks, or building custom Python environments tailored to their needs.
88

9-
First time users are recommended to consult the [LLM tutorials][ref-software-ml-tutorials] to get familiar with the concepts of the Machine Learning platform in a series of hands-on examples.
9+
First time users are recommended to consult the [LLM tutorials][ref-tutorials-ml] to get familiar with the concepts of the Machine Learning platform in a series of hands-on examples.
1010

1111
## Running ML applications with containers (recommended)
1212

@@ -28,7 +28,7 @@ Documented best practices are available for:
2828

2929
Helpful references:
3030

31-
* Introduction to concepts of the Machine Learning platform: [LLM tutorials][ref-software-ml-tutorials]
31+
* Introduction to concepts of the Machine Learning platform: [LLM tutorials][ref-tutorials-ml]
3232
* Running containers on Alps: [Container Engine Guide][ref-container-engine]
3333
* Building custom container images: [Container Build Guide][ref-build-containers]
3434

docs/software/ml/pytorch.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,8 @@ Running PyTorch from a container ensures maximum portability, reproducibility, a
1515
3. (optionally) extending with a virtual environment
1616
4. submitting jobs with CE in SLURM
1717

18-
These steps are illustrated in the [machine learning platform tutorials][ref-software-ml-tutorials] and the instructions detailed in the [podman build guide][ref-build-containers].
18+
!!! example
19+
These steps are illustrated in the [machine learning tutorials][ref-tutorials-ml] and the instructions detailed in the [podman build guide][ref-build-containers].
1920

2021
!!! info "Preliminary steps"
2122
Before proceeding with the next steps, make sure you have storage for podman configured as in the [build guide][ref-build-containers-configure-podman] and make sure to apply [recommended Lustre settings][ref-guides-storage-lustre] to every directory (e.g. `$SCRATCH/ce-images`) dedicated to container images before importing them with enroot. This is necessary to guarantee good filesystem performance.

docs/tutorials/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
[](){#ref-tutorials}
22
# Tutorials
3+
4+
Currently there is one set of tutorials, for [machine learning workflows][ref-tutorials-ml].

docs/tutorials/ml/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[](){#ref-software-ml-tutorials}
1+
[](){#ref-tutorials-ml}
22
# Machine Learning Platform Tutorials
33

44
The LLM tutorials gradually introduce key concepts of the Machine Learning Platform in a series of hands-on examples. A particular focus is on the [Container Engine][ref-container-engine] for managing the runtime environment.
@@ -10,4 +10,4 @@ Building on the first tutorial, in the [second tutorial][software-ml-llm-fine-tu
1010
In the [third tutorial][software-ml-llm-nanotron-tutorial], you will apply the techniques from the previous tutorials to enable distributed (pre-)training of a model in `nanotron` on multiple nodes. In particular, this tutorial makes use of model-parallelism and introduces the usage of `torchrun` to manage jobs on individual nodes.
1111

1212
!!! note
13-
The focus for these tutorials is on introducing concepts of the Machine Learning Platform. As such, they do not necessarily discuss the latest advancements or steps required to obtain maximum performance. For this purpose, consult the framework-specific pages, such as the one for [PyTorch][ref-software-ml-pytorch].
13+
The focus for these tutorials is on introducing concepts of the Machine Learning Platform. As such, they do not necessarily discuss the latest advancements or steps required to obtain maximum performance. For this purpose, consult the framework-specific pages, such as the one for [PyTorch][ref-software-ml-pytorch].

0 commit comments

Comments
 (0)