Skip to content

Commit a33a84b

Browse files
committed
Let rumdl fix the fixable errors it found
$ rumdl check getting_started.md ... Issues: Found 91 issues in 1 file (69ms) Run `rumdl fmt` to automatically fix 45 of the 91 issues $ rumdl check --fix getting_started.md Fixed: Fixed 45/91 issues in 1 file (116ms) $ rumdl check getting_started.md ... Issues: Found 61 issues in 1 file (69ms) Run `rumdl fmt` to automatically fix 16 of the 61 issues $ rumdl check --fix getting_started.md Fixed: Fixed 16/61 issues in 1 file (103ms) $ rumdl check getting_started.md Issues: Found 45 issues in 1 file (68ms) $ rumdl check getting_started.md | grep 'Line length' -c 45 => rumdl can not fix the line length errors automatically
1 parent a69fc66 commit a33a84b

File tree

1 file changed

+82
-45
lines changed

1 file changed

+82
-45
lines changed

mkdocs/docs/HPC/getting_started.md

Lines changed: 82 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
1+
# Title
2+
13
{% set exampleloc="mkdocs/docs/HPC/examples/Getting_Started/tensorflow_mnist" %}
4+
25
# Getting Started
36

47
Welcome to the "Getting Started" guide. This chapter will lead you through the initial steps of logging into the {{hpcinfra}} and submitting your very first job. We'll also walk you through the process step by step using a practical example.
@@ -7,25 +10,25 @@ In addition to this chapter, you might find the [recording of the *Introduction
710

811
Before proceeding, read [the introduction to HPC](introduction.md) to gain an understanding of the {{ hpcinfra }} and related terminology.
912

10-
### Getting Access
13+
## Getting Access
1114

1215
To get access to the {{hpcinfra}}, visit [Getting an HPC Account](account.md).
1316

14-
If you have not used Linux before,
17+
If you have not used Linux before,
1518
{%- if site == 'Gent' %}
1619
now would be a good time to follow our [Linux Tutorial](linux-tutorial/index.md).
1720
{%- else %}
1821
please learn some basics first before continuing. (see [Appendix C - Useful Linux Commands](useful_linux_commands.md))
1922
{%- endif %}
2023

21-
#### A typical workflow looks like this:
24+
### A typical workflow looks like this
2225

23-
1. Connect to the login nodes
24-
2. Transfer your files to the {{hpcinfra}}
25-
3. Optional: compile your code and test it
26-
4. Create a job script and submit your job
27-
5. Wait for job to be executed
28-
6. Study the results generated by your jobs, either on the cluster or
26+
1. Connect to the login nodes
27+
2. Transfer your files to the {{hpcinfra}}
28+
3. Optional: compile your code and test it
29+
4. Create a job script and submit your job
30+
5. Wait for job to be executed
31+
6. Study the results generated by your jobs, either on the cluster or
2932
after downloading them locally.
3033

3134
We will walk through an illustrative workload to get you started. In this example, our objective is to train a deep learning model for recognizing hand-written digits (MNIST dataset) using [TensorFlow](https://www.tensorflow.org/);
@@ -38,10 +41,10 @@ There are two options to connect
3841
- Using a terminal to connect via SSH (for power users) (see [First Time connection to the {{ hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure))
3942
- [Using the web portal](web_portal.md)
4043

41-
Considering your operating system is **{{OS}}**,
44+
Considering your operating system is **{{OS}}**,
4245

4346
{%- if OS == linux %}
44-
it is recommended to make use of the `ssh` command in a terminal to get the most flexibility.
47+
it is recommended to make use of the `ssh` command in a terminal to get the most flexibility.
4548

4649
Assuming you have already generated SSH keys in the previous step ([Getting Access](#getting-access)), and that they are in a default location, you should now be able to login by running the following command:
4750

@@ -50,12 +53,16 @@ ssh {{userid}}@{{loginnode}}
5053
```
5154

5255
!!! Warning "User your own VSC account id"
53-
54-
Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.be>)
56+
57+
```text
58+
Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.be>)
59+
```
5560

5661
!!! Tip
5762

58-
You can also still use the web portal (see [shell access on web portal](web_portal.md#shell-access))
63+
```text
64+
You can also still use the web portal (see [shell access on web portal](web_portal.md#shell-access))
65+
```
5966

6067
{%- else %}
6168
{%- if OS == windows %} it is recommended to use the web portal.
@@ -72,16 +79,17 @@ Make sure you can get to a shell access to the {{hpcinfra}} before proceeding wi
7279

7380
!!! Info
7481

75-
When having problems see the [connection issues section on the troubleshooting page](troubleshooting.md#sec:connecting-issues).
76-
82+
```text
83+
When having problems see the [connection issues section on the troubleshooting page](troubleshooting.md#sec:connecting-issues).
84+
```
7785

7886
### Transfer your files
7987

8088
Now that you can login, it is time to transfer files from your local computer to your **home directory** on the {{hpcinfra}}.
8189

8290
Download following the example scripts to your computer:
8391

84-
- [tensorflow_mnist.py](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/tensorflow_mnist.py)
92+
- [tensorflow_mnist.py](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/tensorflow_mnist.py)
8593
- [run.sh](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/run.sh)
8694

8795
You can also find the example scripts in our git repo: [https://github.com/hpcugent/vsc_user_docs/](https://github.com/hpcugent/vsc_user_docs/tree/main/mkdocs/docs/HPC/examples/Getting_Started/tensorflow_mnist).
@@ -95,17 +103,21 @@ Upload both files (`run.sh` and `tensorflow-mnist.py`) to your **home directory*
95103

96104
!!! Info
97105

98-
As an alternative, you can use WinSCP (see [our section](connecting.md#winscp))
106+
```text
107+
As an alternative, you can use WinSCP (see [our section](connecting.md#winscp))
108+
```
99109

100110
{%- else %}
101111

102112
On your local machine you can run:
113+
103114
```shell
104115
curl -OL https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/tensorflow_mnist.py
105116
curl -OL https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/run.sh
106117
```
107118

108119
Using the `scp` command, the files can be copied from your local host to your *home directory* (`~`) on the remote host (HPC).
120+
109121
```shell
110122
scp tensorflow_mnist.py run.sh {{userid}}{{ loginnode }}:~
111123
```
@@ -115,18 +127,22 @@ ssh {{userid}}@{{ loginnode }}
115127
```
116128

117129
!!! Warning "User your own VSC account id"
118-
119-
Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.be>)
130+
131+
```text
132+
Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.be>)
133+
```
120134

121135
!!! Info
122136

123-
For more information about transfering files or `scp`, see [tranfer files from/to hpc](connecting.md#transfer-files-tofrom-the-hpc).
137+
```text
138+
For more information about transfering files or `scp`, see [tranfer files from/to hpc](connecting.md#transfer-files-tofrom-the-hpc).
139+
```
124140

125141
{%- endif %}
126142

127143
When running `ls` in your session on the {{hpcinfra}}, you should see the two files listed in your home directory (`~`):
128144

129-
```
145+
```text
130146
$ ls ~
131147
run.sh tensorflow_mnist.py
132148
```
@@ -137,8 +153,8 @@ When you do not see these files, make sure you uploaded the files to your **home
137153

138154
Jobs are submitted and executed using job scripts. In our case **run.sh** can be used as a (very minimal) job script.
139155

140-
A job script is a shell script, a text file that specifies the resources,
141-
the software that is used (via `module load` statements),
156+
A job script is a shell script, a text file that specifies the resources,
157+
the software that is used (via `module load` statements),
142158
and the steps that should be executed to run the calculation.
143159

144160
Our job script looks like this:
@@ -150,8 +166,8 @@ module load TensorFlow/2.15.1-foss-2023a
150166

151167
python tensorflow_mnist.py
152168
```
153-
As you can see this job script will run the Python script named **tensorflow_mnist.py**.
154169

170+
As you can see this job script will run the Python script named **tensorflow_mnist.py**.
155171

156172
The jobs you submit are per default executed on **cluser/{{defaultcluster}}**, you can swap to another cluster by issuing the following command.
157173

@@ -160,32 +176,42 @@ module swap cluster/{{othercluster}}
160176
```
161177

162178
!!! Tip
163-
164-
When submitting jobs with limited amount of resources, it is recommended to use the [debug/interactive cluster](interactive_debug.md#interactive-and-debug-cluster): `donphan`.
179+
180+
```text
181+
When submitting jobs with limited amount of resources, it is recommended to use the [debug/interactive cluster](interactive_debug.md#interactive-and-debug-cluster): `donphan`.
182+
```
165183

166184
{%- if site == 'Gent' %}
167185

168-
To get a list of all clusters and their hardware, see <https://www.ugent.be/hpc/en/infrastructure>.
186+
```text
187+
To get a list of all clusters and their hardware, see <https://www.ugent.be/hpc/en/infrastructure>.
188+
```
169189

170190
{%- endif %}
171191

172192
This job script can now be submitted to the cluster's job system for execution, using the qsub (**q**ueue **sub**mit) command:
173193

174-
```
194+
```text
175195
$ qsub run.sh
176196
{{jobid}}
177197
```
178198

179199
This command returns a job identifier (*{{jobid}}*) on the HPC cluster. This is a unique identifier for the job which can be used to monitor and manage your job.
180200

181201
!!! Warning "Make sure you understand what the `module` command does"
182-
183-
Note that the module commands only modify environment variables. For instance, running `module swap cluster/{{othercluster}}` will update your shell environment so that `qsub` submits a job to the `{{othercluster}}` cluster,
184-
but our active shell session is still running on the login node.
185-
186-
It is important to understand that while `module` commands affect your session environment, they do ***not*** change where the commands your are running are being executed: they will still be run on the login node you are on.
187-
188-
When you submit a job script however, the commands ***in*** the job script will be run on a workernode of the cluster the job was submitted to (like `{{othercluster}}`).
202+
203+
```text
204+
Note that the module commands only modify environment variables. For instance, running `module swap cluster/{{othercluster}}` will update your shell environment so that `qsub` submits a job to the `{{othercluster}}` cluster,
205+
but our active shell session is still running on the login node.
206+
```
207+
208+
```text
209+
It is important to understand that while `module` commands affect your session environment, they do ***not*** change where the commands your are running are being executed: they will still be run on the login node you are on.
210+
```
211+
212+
```text
213+
When you submit a job script however, the commands ***in*** the job script will be run on a workernode of the cluster the job was submitted to (like `{{othercluster}}`).
214+
```
189215

190216
For detailed information about `module` commands, read the [running batch jobs](running_batch_jobs.md) chapter.
191217

@@ -195,15 +221,17 @@ Your job is put into a queue before being executed, so it may take a while befor
195221
(see [when will my job start?](running_batch_jobs.md#when-will-my-job-start) for scheduling policy).
196222

197223
You can get an overview of the active jobs using the `qstat` command:
198-
```
224+
225+
```text
199226
$ qstat
200227
Job ID Name User Time Use S Queue
201228
---------- ---------------- --------------- -------- - -------
202229
{{jobid}} run.sh {{userid}} 0:00:00 Q {{othercluster}}
203230
```
204231

205232
Eventually, after entering `qstat` again you should see that your job has started running:
206-
```
233+
234+
```text
207235
$ qstat
208236
Job ID Name User Time Use S Queue
209237
---------- ---------------- --------------- -------- - -------
@@ -227,25 +255,30 @@ By default located in the directory where you issued `qsub`.
227255

228256
!!! Info
229257

230-
For more information about the stdout and stderr output channels, see this [section](linux-tutorial/beyond_the_basics.md#inputoutput).
258+
```text
259+
For more information about the stdout and stderr output channels, see this [section](linux-tutorial/beyond_the_basics.md#inputoutput).
260+
```
231261

232262
{%- endif %}
233263

234264
In our example when running `ls` in the current directory you should see 2 new files:
235-
265+
236266
- **run.sh.o{{jobid}}**, containing *normal output messages* produced by job {{jobid}};
237267
- **run.sh.e{{jobid}}**, containing *errors and warnings* produced by job {{jobid}}.
238268

239269
!!! Info
240-
270+
241271
run.sh.e{{jobid}} should be empty (no errors or warnings).
242272

243273
!!! Warning "Use your own job ID"
244274

245-
Replace **{{jobid}}** with the jobid you got from the `qstat` command (see above) or simply look for added files in your current directory by running `ls`.
275+
```text
276+
Replace **{{jobid}}** with the jobid you got from the `qstat` command (see above) or simply look for added files in your current directory by running `ls`.
277+
```
246278

247279
When examining the contents of ``run.sh.o{{jobid}}`` you will see something like this:
248-
```
280+
281+
```text
249282
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
250283
11493376/11490434 [==============================] - 1s 0us/step
251284
Epoch 1/5
@@ -265,9 +298,13 @@ Hurray 🎉, we trained a deep learning model and achieved 97,64 percent accurac
265298

266299
!!! Warning
267300

268-
When using TensorFlow specifically, you should actually submit jobs to a GPU cluster for better performance, see [GPU clusters](gpu.md).
301+
```text
302+
When using TensorFlow specifically, you should actually submit jobs to a GPU cluster for better performance, see [GPU clusters](gpu.md).
303+
```
269304

270-
For the purpose of this example, we are running a very small TensorFlow workload on a CPU-only cluster.
305+
```text
306+
For the purpose of this example, we are running a very small TensorFlow workload on a CPU-only cluster.
307+
```
271308

272309
### Next steps
273310

0 commit comments

Comments
 (0)