Skip to content

Commit c36f6c3

Browse files
committed
fix line length
and some spelling
1 parent a33a84b commit c36f6c3

File tree

1 file changed

+107
-46
lines changed

1 file changed

+107
-46
lines changed

mkdocs/docs/HPC/getting_started.md

Lines changed: 107 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,18 @@
44

55
# Getting Started
66

7-
Welcome to the "Getting Started" guide. This chapter will lead you through the initial steps of logging into the {{hpcinfra}} and submitting your very first job. We'll also walk you through the process step by step using a practical example.
7+
Welcome to the "Getting Started" guide. This chapter will lead you through the
8+
initial steps of logging into the {{hpcinfra}} and submitting your very first
9+
job. We'll also walk you through the process step by step using a practical
10+
example.
811

9-
In addition to this chapter, you might find the [recording of the *Introduction to HPC-UGent* training session](https://www.ugent.be/hpc/en/training/introhpcugent-recording) to be a useful resource.
12+
In addition to this chapter, you might find the [recording of the *Introduction
13+
to HPC-UGent* training
14+
session](https://www.ugent.be/hpc/en/training/introhpcugent-recording) to be a
15+
useful resource.
1016

11-
Before proceeding, read [the introduction to HPC](introduction.md) to gain an understanding of the {{ hpcinfra }} and related terminology.
17+
Before proceeding, read [the introduction to HPC](introduction.md) to gain an
18+
understanding of the {{ hpcinfra }} and related terminology.
1219

1320
## Getting Access
1421

@@ -18,7 +25,8 @@ If you have not used Linux before,
1825
{%- if site == 'Gent' %}
1926
now would be a good time to follow our [Linux Tutorial](linux-tutorial/index.md).
2027
{%- else %}
21-
please learn some basics first before continuing. (see [Appendix C - Useful Linux Commands](useful_linux_commands.md))
28+
please learn some basics first before continuing. (see [Appendix C - Useful
29+
Linux Commands](useful_linux_commands.md))
2230
{%- endif %}
2331

2432
### A typical workflow looks like this
@@ -31,22 +39,30 @@ please learn some basics first before continuing. (see [Appendix C - Useful Linu
3139
6. Study the results generated by your jobs, either on the cluster or
3240
after downloading them locally.
3341

34-
We will walk through an illustrative workload to get you started. In this example, our objective is to train a deep learning model for recognizing hand-written digits (MNIST dataset) using [TensorFlow](https://www.tensorflow.org/);
42+
We will walk through an illustrative workload to get you started. In this
43+
example, our objective is to train a deep learning model for recognizing
44+
hand-written digits (MNIST dataset) using
45+
[TensorFlow](https://www.tensorflow.org/);
3546
see the [example scripts](https://github.com/hpcugent/vsc_user_docs/tree/main/{{exampleloc}}).
3647

3748
### Getting Connected
3849

3950
There are two options to connect
4051

41-
- Using a terminal to connect via SSH (for power users) (see [First Time connection to the {{ hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure))
52+
- Using a terminal to connect via SSH (for power users)
53+
(see [First Time connection to the
54+
{{hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure))
4255
- [Using the web portal](web_portal.md)
4356

4457
Considering your operating system is **{{OS}}**,
4558

4659
{%- if OS == linux %}
47-
it is recommended to make use of the `ssh` command in a terminal to get the most flexibility.
60+
it is recommended to make use of the `ssh` command in a terminal to get the
61+
most flexibility.
4862

49-
Assuming you have already generated SSH keys in the previous step ([Getting Access](#getting-access)), and that they are in a default location, you should now be able to login by running the following command:
63+
Assuming you have already generated SSH keys in the previous step ([Getting
64+
Access](#getting-access)), and that they are in a default location, you should
65+
now be able to login by running the following command:
5066

5167
```shell
5268
ssh {{userid}}@{{loginnode}}
@@ -55,51 +71,64 @@ ssh {{userid}}@{{loginnode}}
5571
!!! Warning "User your own VSC account id"
5672

5773
```text
58-
Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.be>)
74+
Replace **{{userid}}** with your VSC account id (see
75+
<https://account.vscentrum.be>)
5976
```
6077

6178
!!! Tip
6279

6380
```text
64-
You can also still use the web portal (see [shell access on web portal](web_portal.md#shell-access))
81+
You can also still use the web portal (see [shell access on web
82+
portal](web_portal.md#shell-access))
6583
```
6684

6785
{%- else %}
6886
{%- if OS == windows %} it is recommended to use the web portal.
69-
{%- else %} it should be easy to make use of the `ssh` command in a terminal, but the web portal will work too. {%- endif %}
87+
{%- else %} it should be easy to make use of the `ssh` command in a terminal,
88+
but the web portal will work too. {%- endif %}
7089

71-
The [web portal](web_portal.md) offers a convenient way to upload files and gain shell access to the {{hpcinfra}} from a standard web browser (no software installation or configuration required).
90+
The [web portal](web_portal.md) offers a convenient way to upload files and
91+
gain shell access to the {{hpcinfra}} from a standard web browser (no software
92+
installation or configuration required).
7293

7394
See [shell access](web_portal.md#shell-access) when using the web portal, or
74-
[connection to the {{hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure) when using a terminal.
95+
[connection to the
96+
{{hpcinfra}}](connecting.md#first-time-connection-to-the-hpc-infrastructure)
97+
when using a terminal.
7598

76-
Make sure you can get to a shell access to the {{hpcinfra}} before proceeding with the next steps.
99+
Make sure you can get to a shell access to the {{hpcinfra}} before proceeding
100+
with the next steps.
77101

78102
{%- endif %}
79103

80104
!!! Info
81105

82106
```text
83-
When having problems see the [connection issues section on the troubleshooting page](troubleshooting.md#sec:connecting-issues).
107+
When having problems see the [connection issues section on the troubleshooting
108+
page](troubleshooting.md#sec:connecting-issues).
84109
```
85110

86111
### Transfer your files
87112

88-
Now that you can login, it is time to transfer files from your local computer to your **home directory** on the {{hpcinfra}}.
113+
Now that you can login, it is time to transfer files from your local computer
114+
to your **home directory** on the {{hpcinfra}}.
89115

90116
Download following the example scripts to your computer:
91117

92118
- [tensorflow_mnist.py](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/tensorflow_mnist.py)
93119
- [run.sh](https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/run.sh)
94120

95-
You can also find the example scripts in our git repo: [https://github.com/hpcugent/vsc_user_docs/](https://github.com/hpcugent/vsc_user_docs/tree/main/mkdocs/docs/HPC/examples/Getting_Started/tensorflow_mnist).
121+
You can also find the example scripts in our git repository:
122+
[https://github.com/hpcugent/vsc_user_docs/](https://github.com/hpcugent/vsc_user_docs/tree/main/mkdocs/docs/HPC/examples/Getting_Started/tensorflow_mnist).
96123

97124
{%- if OS == windows %}
98125

99-
The [HPC-UGent web portal](https://login.hpc.ugent.be) provides a file browser that allows uploading files.
126+
The [HPC-UGent web portal](https://login.hpc.ugent.be) provides a file browser
127+
that allows uploading files.
100128
For more information see the [file browser section](web_portal.md#file-browser).
101129

102-
Upload both files (`run.sh` and `tensorflow-mnist.py`) to your **home directory** and go back to your shell.
130+
Upload both files (`run.sh` and `tensorflow-mnist.py`) to your **home
131+
directory** and go back to your shell.
103132

104133
!!! Info
105134

@@ -116,7 +145,8 @@ curl -OL https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{example
116145
curl -OL https://raw.githubusercontent.com/hpcugent/vsc_user_docs/main/{{exampleloc}}/run.sh
117146
```
118147

119-
Using the `scp` command, the files can be copied from your local host to your *home directory* (`~`) on the remote host (HPC).
148+
Using the `scp` command, the files can be copied from your local host to your
149+
*home directory* (`~`) on the remote host (HPC).
120150

121151
```shell
122152
scp tensorflow_mnist.py run.sh {{userid}}{{ loginnode }}:~
@@ -135,23 +165,27 @@ Replace **{{userid}}** with your VSC account id (see <https://account.vscentrum.
135165
!!! Info
136166

137167
```text
138-
For more information about transfering files or `scp`, see [tranfer files from/to hpc](connecting.md#transfer-files-tofrom-the-hpc).
168+
For more information about transferring files or `scp`, see [transfer files
169+
from/to hpc](connecting.md#transfer-files-tofrom-the-hpc).
139170
```
140171

141172
{%- endif %}
142173

143-
When running `ls` in your session on the {{hpcinfra}}, you should see the two files listed in your home directory (`~`):
174+
When running `ls` in your session on the {{hpcinfra}}, you should see the two
175+
files listed in your home directory (`~`):
144176

145177
```text
146178
$ ls ~
147179
run.sh tensorflow_mnist.py
148180
```
149181

150-
When you do not see these files, make sure you uploaded the files to your **home directory**.
182+
When you do not see these files, make sure you uploaded the files to your
183+
**home directory**.
151184

152185
### Submitting a job
153186

154-
Jobs are submitted and executed using job scripts. In our case **run.sh** can be used as a (very minimal) job script.
187+
Jobs are submitted and executed using job scripts. In our case **run.sh** can
188+
be used as a (very minimal) job script.
155189

156190
A job script is a shell script, a text file that specifies the resources,
157191
the software that is used (via `module load` statements),
@@ -167,9 +201,11 @@ module load TensorFlow/2.15.1-foss-2023a
167201
python tensorflow_mnist.py
168202
```
169203

170-
As you can see this job script will run the Python script named **tensorflow_mnist.py**.
204+
As you can see this job script will run the Python script named
205+
**tensorflow_mnist.py**.
171206

172-
The jobs you submit are per default executed on **cluser/{{defaultcluster}}**, you can swap to another cluster by issuing the following command.
207+
The jobs you submit are per default executed on **cluster/{{defaultcluster}}**,
208+
you can swap to another cluster by issuing the following command.
173209

174210
```shell
175211
module swap cluster/{{othercluster}}
@@ -178,47 +214,62 @@ module swap cluster/{{othercluster}}
178214
!!! Tip
179215

180216
```text
181-
When submitting jobs with limited amount of resources, it is recommended to use the [debug/interactive cluster](interactive_debug.md#interactive-and-debug-cluster): `donphan`.
217+
When submitting jobs with limited amount of resources, it is recommended to use
218+
the [debug/interactive
219+
cluster](interactive_debug.md#interactive-and-debug-cluster): `donphan`.
182220
```
183221

184222
{%- if site == 'Gent' %}
185223

186224
```text
187-
To get a list of all clusters and their hardware, see <https://www.ugent.be/hpc/en/infrastructure>.
225+
To get a list of all clusters and their hardware, see
226+
<https://www.ugent.be/hpc/en/infrastructure>.
188227
```
189228

190229
{%- endif %}
191230

192-
This job script can now be submitted to the cluster's job system for execution, using the qsub (**q**ueue **sub**mit) command:
231+
This job script can now be submitted to the cluster's job system for execution,
232+
using the qsub (**q**ueue **sub**mit) command:
193233

194234
```text
195235
$ qsub run.sh
196236
{{jobid}}
197237
```
198238

199-
This command returns a job identifier (*{{jobid}}*) on the HPC cluster. This is a unique identifier for the job which can be used to monitor and manage your job.
239+
This command returns a job identifier (*{{jobid}}*) on the HPC cluster. This is
240+
a unique identifier for the job which can be used to monitor and manage your
241+
job.
200242

201243
!!! Warning "Make sure you understand what the `module` command does"
202244

203245
```text
204-
Note that the module commands only modify environment variables. For instance, running `module swap cluster/{{othercluster}}` will update your shell environment so that `qsub` submits a job to the `{{othercluster}}` cluster,
246+
Note that the module commands only modify environment variables. For instance,
247+
running `module swap cluster/{{othercluster}}` will update your shell
248+
environment so that `qsub` submits a job to the `{{othercluster}}` cluster,
205249
but our active shell session is still running on the login node.
206250
```
207251

208252
```text
209-
It is important to understand that while `module` commands affect your session environment, they do ***not*** change where the commands your are running are being executed: they will still be run on the login node you are on.
253+
It is important to understand that while `module` commands affect your session
254+
environment, they do ***not*** change where the commands your are running are
255+
being executed: they will still be run on the login node you are on.
210256
```
211257

212258
```text
213-
When you submit a job script however, the commands ***in*** the job script will be run on a workernode of the cluster the job was submitted to (like `{{othercluster}}`).
259+
When you submit a job script however, the commands ***in*** the job script will
260+
be run on a workernode of the cluster the job was submitted to (like
261+
`{{othercluster}}`).
214262
```
215263

216-
For detailed information about `module` commands, read the [running batch jobs](running_batch_jobs.md) chapter.
264+
For detailed information about `module` commands, read the [running batch
265+
jobs](running_batch_jobs.md) chapter.
217266

218267
### Wait for job to be executed
219268

220-
Your job is put into a queue before being executed, so it may take a while before it actually starts.
221-
(see [when will my job start?](running_batch_jobs.md#when-will-my-job-start) for scheduling policy).
269+
Your job is put into a queue before being executed, so it may take a while
270+
before it actually starts.
271+
(see [when will my job start?](running_batch_jobs.md#when-will-my-job-start)
272+
for scheduling policy).
222273

223274
You can get an overview of the active jobs using the `qstat` command:
224275

@@ -229,7 +280,8 @@ Job ID Name User Time Use S Queue
229280
{{jobid}} run.sh {{userid}} 0:00:00 Q {{othercluster}}
230281
```
231282

232-
Eventually, after entering `qstat` again you should see that your job has started running:
283+
Eventually, after entering `qstat` again you should see that your job has
284+
started running:
233285

234286
```text
235287
$ qstat
@@ -238,9 +290,11 @@ Job ID Name User Time Use S Queue
238290
{{jobid}} run.sh {{userid}} 0:00:01 R {{othercluster}}
239291
```
240292

241-
If you don't see your job in the output of the `qstat` command anymore, your job has likely completed.
293+
If you don't see your job in the output of the `qstat` command anymore, your
294+
job has likely completed.
242295

243-
Read [this section](running_batch_jobs.md#monitoring-and-managing-your-jobs) on how to interpret the output.
296+
Read [this section](running_batch_jobs.md#monitoring-and-managing-your-jobs) on
297+
how to interpret the output.
244298

245299
### Inspect your results
246300

@@ -256,7 +310,8 @@ By default located in the directory where you issued `qsub`.
256310
!!! Info
257311

258312
```text
259-
For more information about the stdout and stderr output channels, see this [section](linux-tutorial/beyond_the_basics.md#inputoutput).
313+
For more information about the stdout and stderr output channels, see this
314+
[section](linux-tutorial/beyond_the_basics.md#inputoutput).
260315
```
261316

262317
{%- endif %}
@@ -273,10 +328,13 @@ In our example when running `ls` in the current directory you should see 2 new f
273328
!!! Warning "Use your own job ID"
274329

275330
```text
276-
Replace **{{jobid}}** with the jobid you got from the `qstat` command (see above) or simply look for added files in your current directory by running `ls`.
331+
Replace **{{jobid}}** with the jobid you got from the `qstat` command (see
332+
above) or simply look for added files in your current directory by running
333+
`ls`.
277334
```
278335

279-
When examining the contents of ``run.sh.o{{jobid}}`` you will see something like this:
336+
When examining the contents of ``run.sh.o{{jobid}}`` you will see something
337+
like this:
280338

281339
```text
282340
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
@@ -294,16 +352,18 @@ Epoch 5/5
294352
313/313 - 0s - loss: 0.0782 - accuracy: 0.9764
295353
```
296354

297-
Hurray 🎉, we trained a deep learning model and achieved 97,64 percent accuracy.
355+
Hurrah 🎉, we trained a deep learning model and achieved 97,64 percent accuracy.
298356

299357
!!! Warning
300358

301359
```text
302-
When using TensorFlow specifically, you should actually submit jobs to a GPU cluster for better performance, see [GPU clusters](gpu.md).
360+
When using TensorFlow specifically, you should actually submit jobs to a GPU
361+
cluster for better performance, see [GPU clusters](gpu.md).
303362
```
304363

305364
```text
306-
For the purpose of this example, we are running a very small TensorFlow workload on a CPU-only cluster.
365+
For the purpose of this example, we are running a very small TensorFlow
366+
workload on a CPU-only cluster.
307367
```
308368

309369
### Next steps
@@ -313,4 +373,5 @@ For the purpose of this example, we are running a very small TensorFlow workload
313373
- [Multi core jobs/Parallel Computing](multi_core_jobs.md)
314374
- [Interactive and debug cluster](interactive_debug.md#interactive-and-debug-cluster)
315375

316-
For more examples see [Program examples](program_examples.md) and [Job script examples](jobscript_examples.md)
376+
For more examples see [Program examples](program_examples.md) and [Job script
377+
examples](jobscript_examples.md)

0 commit comments

Comments
 (0)