Skip to content

Commit a64e3eb

Browse files
committed
merge main
2 parents 0599372 + 720fd50 commit a64e3eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1653
-465
lines changed

.github/CODEOWNERS

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
* @bcumming @msimberg @RMeli
22
docs/access/jupyterlab.md @rsarm
3-
docs/services/firecrest @jpdorsch @ekouts
3+
docs/services/firecrest.md @jpdorsch @ekouts @theely @iB0nes
4+
docs/services/devportal.md @jpdorsch @ekouts @theely @iB0nes
45
docs/services/kubernetes @eliaoggian
56
docs/software/communication @Madeeks @msimberg
67
docs/software/devtools/linaro @jgphpc
@@ -9,6 +10,6 @@ docs/software/prgenv/linalg.md @finkandreas @msimberg
910
docs/software/sciapps/cp2k.md @abussy @RMeli
1011
docs/software/sciapps/lammps.md @nickjbrowning
1112
docs/software/sciapps/gromacs.md @kanduri
12-
docs/software/ml @boeschf
13+
docs/software/ml @boeschf @henrique @lukasgd
1314
docs/storage @mpasserini
1415
docs/alps/storage.md @mpasserini

.github/CONTRIBUTING.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Contributing
2+
3+
Please take some time to get familiar with the [contributing guidelines](https://docs.cscs.ch/contributing/) before making your first contribution.

.github/actions/spelling/allow.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,8 @@ NICs
7777
NVMe
7878
Nordend
7979
OpenFabrics
80+
OAuth
81+
OIDC
8082
OSS
8183
OSSs
8284
OTP
@@ -106,6 +108,7 @@ Scopi
106108
Signalkuppe
107109
TOTP
108110
UANs
111+
UIs
109112
UserLab
110113
Wannier
111114
XDG
@@ -215,6 +218,7 @@ netlib
215218
netrc
216219
nql
217220
nsight
221+
nsys
218222
numa
219223
nvcr
220224
nvdashboard
@@ -231,6 +235,7 @@ opls
231235
osts
232236
osu
233237
papi
238+
paraview
234239
parmetis
235240
petsc
236241
pme
@@ -298,6 +303,7 @@ subtables
298303
supercomputing
299304
superlu
300305
sysadmin
306+
simberg
301307
tarball
302308
tcl
303309
tcsh
@@ -343,6 +349,7 @@ wikipedia
343349
wikitext
344350
wlcg
345351
workaround
352+
workarounds
346353
workflows
347354
wrf
348355
xattr
@@ -359,3 +366,4 @@ jobscript
359366
Scalasca
360367
tracefile
361368
Vampir
369+
XAmz
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
JAX
2+
nvitop
3+
NVRTC
4+
placeholders

.github/actions/spelling/patterns.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Säntis
1414
ScaLAPACK
1515
VSCode
1616
aarch64
17+
WSO2
1718

1819
# markdown figure
1920
!\[.*\]\(.*\)
@@ -41,3 +42,6 @@ https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0
4142

4243
# img tag
4344
<img.*>
45+
46+
# @name (e.g. github handles)
47+
@[A-Za-z0-9-_]+

.github/workflows/welcome.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
name: First interaction message
2+
3+
on:
4+
pull_request_target:
5+
types: [opened]
6+
branches:
7+
- main
8+
9+
jobs:
10+
greeting:
11+
runs-on: ubuntu-latest
12+
permissions:
13+
pull-requests: write
14+
steps:
15+
- uses: actions/first-interaction@v3
16+
with:
17+
pr-message: |
18+
Thank you for your contribution to eth-cscs/cscs-docs.
19+
20+
If you have not done so already, please take some time to get familiar with the [contributing guidelines](https://docs.cscs.ch/contributing/).
21+
Following the guidelines helps us keep the documentation consistent and as useful as possible for users.

LICENSE

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
Creative Commons Legal Code
2+
3+
CC0 1.0 Universal
4+
5+
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
6+
LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
7+
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
8+
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
9+
REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
10+
PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
11+
THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
12+
HEREUNDER.
13+
14+
Statement of Purpose
15+
16+
The laws of most jurisdictions throughout the world automatically confer
17+
exclusive Copyright and Related Rights (defined below) upon the creator
18+
and subsequent owner(s) (each and all, an "owner") of an original work of
19+
authorship and/or a database (each, a "Work").
20+
21+
Certain owners wish to permanently relinquish those rights to a Work for
22+
the purpose of contributing to a commons of creative, cultural and
23+
scientific works ("Commons") that the public can reliably and without fear
24+
of later claims of infringement build upon, modify, incorporate in other
25+
works, reuse and redistribute as freely as possible in any form whatsoever
26+
and for any purposes, including without limitation commercial purposes.
27+
These owners may contribute to the Commons to promote the ideal of a free
28+
culture and the further production of creative, cultural and scientific
29+
works, or to gain reputation or greater distribution for their Work in
30+
part through the use and efforts of others.
31+
32+
For these and/or other purposes and motivations, and without any
33+
expectation of additional consideration or compensation, the person
34+
associating CC0 with a Work (the "Affirmer"), to the extent that he or she
35+
is an owner of Copyright and Related Rights in the Work, voluntarily
36+
elects to apply CC0 to the Work and publicly distribute the Work under its
37+
terms, with knowledge of his or her Copyright and Related Rights in the
38+
Work and the meaning and intended legal effect of CC0 on those rights.
39+
40+
1. Copyright and Related Rights. A Work made available under CC0 may be
41+
protected by copyright and related or neighboring rights ("Copyright and
42+
Related Rights"). Copyright and Related Rights include, but are not
43+
limited to, the following:
44+
45+
i. the right to reproduce, adapt, distribute, perform, display,
46+
communicate, and translate a Work;
47+
ii. moral rights retained by the original author(s) and/or performer(s);
48+
iii. publicity and privacy rights pertaining to a person's image or
49+
likeness depicted in a Work;
50+
iv. rights protecting against unfair competition in regards to a Work,
51+
subject to the limitations in paragraph 4(a), below;
52+
v. rights protecting the extraction, dissemination, use and reuse of data
53+
in a Work;
54+
vi. database rights (such as those arising under Directive 96/9/EC of the
55+
European Parliament and of the Council of 11 March 1996 on the legal
56+
protection of databases, and under any national implementation
57+
thereof, including any amended or successor version of such
58+
directive); and
59+
vii. other similar, equivalent or corresponding rights throughout the
60+
world based on applicable law or treaty, and any national
61+
implementations thereof.
62+
63+
2. Waiver. To the greatest extent permitted by, but not in contravention
64+
of, applicable law, Affirmer hereby overtly, fully, permanently,
65+
irrevocably and unconditionally waives, abandons, and surrenders all of
66+
Affirmer's Copyright and Related Rights and associated claims and causes
67+
of action, whether now known or unknown (including existing as well as
68+
future claims and causes of action), in the Work (i) in all territories
69+
worldwide, (ii) for the maximum duration provided by applicable law or
70+
treaty (including future time extensions), (iii) in any current or future
71+
medium and for any number of copies, and (iv) for any purpose whatsoever,
72+
including without limitation commercial, advertising or promotional
73+
purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
74+
member of the public at large and to the detriment of Affirmer's heirs and
75+
successors, fully intending that such Waiver shall not be subject to
76+
revocation, rescission, cancellation, termination, or any other legal or
77+
equitable action to disrupt the quiet enjoyment of the Work by the public
78+
as contemplated by Affirmer's express Statement of Purpose.
79+
80+
3. Public License Fallback. Should any part of the Waiver for any reason
81+
be judged legally invalid or ineffective under applicable law, then the
82+
Waiver shall be preserved to the maximum extent permitted taking into
83+
account Affirmer's express Statement of Purpose. In addition, to the
84+
extent the Waiver is so judged Affirmer hereby grants to each affected
85+
person a royalty-free, non transferable, non sublicensable, non exclusive,
86+
irrevocable and unconditional license to exercise Affirmer's Copyright and
87+
Related Rights in the Work (i) in all territories worldwide, (ii) for the
88+
maximum duration provided by applicable law or treaty (including future
89+
time extensions), (iii) in any current or future medium and for any number
90+
of copies, and (iv) for any purpose whatsoever, including without
91+
limitation commercial, advertising or promotional purposes (the
92+
"License"). The License shall be deemed effective as of the date CC0 was
93+
applied by Affirmer to the Work. Should any part of the License for any
94+
reason be judged legally invalid or ineffective under applicable law, such
95+
partial invalidity or ineffectiveness shall not invalidate the remainder
96+
of the License, and in such case Affirmer hereby affirms that he or she
97+
will not (i) exercise any of his or her remaining Copyright and Related
98+
Rights in the Work or (ii) assert any associated claims and causes of
99+
action with respect to the Work, in either case contrary to Affirmer's
100+
express Statement of Purpose.
101+
102+
4. Limitations and Disclaimers.
103+
104+
a. No trademark or patent rights held by Affirmer are waived, abandoned,
105+
surrendered, licensed or otherwise affected by this document.
106+
b. Affirmer offers the Work as-is and makes no representations or
107+
warranties of any kind concerning the Work, express, implied,
108+
statutory or otherwise, including without limitation warranties of
109+
title, merchantability, fitness for a particular purpose, non
110+
infringement, or the absence of latent or other defects, accuracy, or
111+
the present or absence of errors, whether or not discoverable, all to
112+
the greatest extent permissible under applicable law.
113+
c. Affirmer disclaims responsibility for clearing rights of other persons
114+
that may apply to the Work or any use thereof, including without
115+
limitation any person's Copyright and Related Rights in the Work.
116+
Further, Affirmer disclaims responsibility for obtaining any necessary
117+
consents, permissions or other rights required for any use of the
118+
Work.
119+
d. Affirmer understands and acknowledges that Creative Commons is not a
120+
party to this document and has no duty or obligation with respect to
121+
this CC0 or use of the Work.

docs/services/firecrest.md renamed to docs/access/firecrest.md

Lines changed: 9 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33

44
FirecREST is a RESTful API for programmatically accessing High-Performance Computing resources, developed at CSCS.
55

6-
Users can make use of FirecREST to automate access to HPC, enabling [CI/CD pipelines](https://eth-cscs.github.io/firecrest-v2/use_cases/CI-pipeline/), [workflow managers](https://github.com/eth-cscs/firecrest/tree/master/examples/airflow-operators), and other tools against HPC resources.
6+
Users can make use of FirecREST to automate access to HPC, enabling [CI/CD pipelines](https://eth-cscs.github.io/firecrest-v2/use_cases/CI-pipeline/), [workflow orchestrators](https://eth-cscs.github.io/firecrest-v2/use_cases/workflow-orchestrator/), and other tools against HPC resources.
77

8-
Additionally, scientific platform developers can integrate FirecREST into [web-enabled portals](https://my.hpcp.cscs.ch) and [applications](https://github.com/eth-cscs/firecrest/tree/master/examples/UI-client-credentials), allowing them to securely access authenticated and authorized CSCS services such as job submission and data transfer on HPC systems.
8+
Additionally, scientific platform developers can integrate FirecREST into [web-enabled portals](https://eth-cscs.github.io/firecrest-ui/home/) and [web UI applications](https://eth-cscs.github.io/firecrest-v2/use_cases/UI-client-credentials/), allowing them to securely access authenticated and authorized CSCS services such as job submission and data transfer on HPC systems.
99

1010
Users can make HTTP requests to perform the following operations:
1111

@@ -55,7 +55,7 @@ FirecREST is available for all three major [Alps platforms][ref-alps-platforms],
5555

5656
### Clients and access tokens
5757

58-
For authenticating requests to FirecREST, **client applications** use an **access token** instead of directly using the user's credentials.
58+
For authenticating requests to FirecREST, [client applications][ref-devportal-application] use an **access token** instead of directly using the user's credentials.
5959
The access token is a signed JSON Web Token ([JWT](https://jwt.io/introduction)) which contains user information and is only valid for a short time (5 minutes).
6060
Behind the API, all commands launched by the client will use the account of the user that registered the client, inheriting their access rights.
6161

@@ -65,77 +65,14 @@ Every client has a client ID (Consumer Key) and a secret (Consumer Secret) that
6565
```
6666
curl -s -X POST https://auth.cscs.ch/auth/realms/firecrest-clients/protocol/openid-connect/token \
6767
--data "grant_type=client_credentials" \
68-
--data "client_id=<your_client>" \
69-
--data "client_secret=<your_secret>"
68+
--data "client_id=<client_id>" \
69+
--data "client_secret=<client_secret>"
7070
```
7171

7272
You can manage your client application on the [CSCS Developer Portal][ref-devportal].
7373

74-
[](){#ref-devportal}
75-
### CSCS Developer Portal
7674

77-
The [Developer Portal](https://developer.cscs.ch) facilitates CSCS users to manage subscriptions to an API at CSCS (such as FirecREST v1/v2).
78-
79-
Start by browsing to [developer.cscs.ch](https://developer.cscs.ch), then sign in by clicking the "SIGN-IN" button on the top right hand corner of the page.
80-
81-
Once logged in, you will see a list of APIs that are available to your user.
82-
83-
!!! Warning
84-
You might not see version 1 or version 2 of some API. You will be able to see all the versions when you *subscribe* your Application to the API.
85-
86-
### Creating an Application
87-
88-
Click on the "Applications" button at the top of the screen to manage your Applications.
89-
90-
![FirecREST Main Page](../images/firecrest/f7t-apis.png)
91-
92-
To create a new application, click on the "ADD NEW APPLICATION" button at the top of the Applications page, and complete the mandatory fields (marked with `*`).
93-
Make sure to give the application a unique name and select the number of requests per minute.
94-
When finished, click on the "Save" button.
95-
96-
!!! note
97-
To subscribe to an API you need at least one application, for which it is possible to use the DefaultApplication.
98-
99-
!!! note
100-
The quota of requests per minute will be shared by all subscribers to the Application over all APIs.
101-
102-
### Configuring Production Keys
103-
104-
Once the Application is created, create the Production Keys (`Client ID` and `Client Secret`) by clicking on "Production Keys"
105-
106-
![FirecREST production keys](../images/firecrest/f7t-keys.png)
107-
108-
109-
Use this if this is your first FirecREST application, or if you wish to create new keys.
110-
111-
* click on the "Generate Keys" button at the bottom of the page
112-
113-
![FirecREST existing keys](../images/firecrest/f7t-generate-keys.png)
114-
115-
Once the keys are generated, you will see the pair "Consumer Key" and "Consumer Secret".
116-
117-
![FirecREST keys](../images/firecrest/f7t-keys-overview.png)
118-
119-
!!! warning
120-
Store this pair of credentials securely, these are the access keys to your resources at CSCS.
121-
122-
### Subscribe to an API
123-
124-
Once you have set up your Application, is time to subscribe it to an API.
125-
126-
To do so:
127-
128-
* (8a) click on the "Subscriptions" option on the left panel
129-
* (8b) click the :fontawesome-solid-circle-plus: "Subscribe APIS" button
130-
* (8c) choose the API you want to subscribe to by clicking the "Subscribe" button
131-
132-
![FirecREST subscriptions](../images/firecrest/f7t-api-subscriptions.png)
133-
134-
Back on the Subscription Management page, you can review your active subscriptions and APIs that your Application has access to.
135-
136-
![FirecREST subscription management](../images/firecrest/f7t-api-subscriptions-management.png)
137-
138-
To use your Application to access FirecREST, follow the [API documentation](https://eth-cscs.github.io/firecrest-v2/openapi).
75+
To use your client credentials to access FirecREST, follow the [API documentation](https://eth-cscs.github.io/firecrest-v2/openapi).
13976

14077
## Getting Started
14178

@@ -443,8 +380,9 @@ A staging area is used for external transfers and downloading/uploading a file f
443380

444381
## Further Information
445382

446-
* [FirecREST UI for HPC Platform](https://my.hpcp.cscs.ch)
447-
* [FirecREST UI for ML Platform](https://my.mlp.cscs.ch)
383+
* [HPC Platform Dashboard](https://my.hpcp.cscs.ch)
384+
* [ML Platform Dashboard](https://my.mlp.cscs.ch)
385+
* [C&W Platform Dashboard](https://my.cwp.cscs.ch)
448386
* [FirecREST OpenAPI Specification](https://eth-cscs.github.io/firecrest-v2/openapi)
449387
* [FirecREST Official Docs](https://eth-cscs.github.io/firecrest-v2)
450388
* [Documentation of pyFirecREST](https://pyfirecrest.readthedocs.io/)

docs/access/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@ This documentation guides users through the process of accessing CSCS systems an
2626

2727
[:octicons-arrow-right-24: SSH][ref-ssh]
2828

29+
- :material-fire-circle: __FirecREST__
30+
31+
FirecREST is a RESTful API for programmatically accessing High-Performance Computing resources.
32+
33+
[:octicons-arrow-right-24: FirecREST][ref-firecrest]
34+
2935
- :simple-jupyter: __JupyterLab__
3036

3137
JupyterLab is a feature-rich notebook authoring application and editing environment.

docs/access/jupyterlab.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,10 +83,10 @@ If the default base images do not meet your requirements, you can specify a cust
8383

8484
1. Avoid mounting all of `$HOME` to avoid subtle issues with cached files, but mount Jupyter kernels
8585
2. Enable Slurm commands (together with two subsequent mounts)
86-
3. Currently only required on Daint and Santis, not on Clariden
86+
3. Required only for Daint and Santis; Do not use on Clariden
8787
4. Set working directory of Jupyter session (file browser root directory)
8888
5. Use environment settings for optimized communication
89-
6. Disable CUDA JIT cache
89+
6. Avoid writing JITed binaries to the (distributed) file system, which could lead to performance issues.
9090
7. Async error handling when an exception is observed in NCCL watchdog: aborting NCCL communicator and tearing down process upon error
9191
8. Disable GPU support in MPICH, as it can lead to deadlocks when using together with NCCL
9292

@@ -199,7 +199,9 @@ Examples of notebooks with `ipcmagic` can be found [here](https://github.com/
199199

200200
While it is generally recommended to submit long-running machine learning training and inference jobs via `sbatch`, certain use cases can benefit from an interactive Jupyter environment.
201201

202-
A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-guides-mlp-tutorials]. In particular, the `accelerate launch` script in the [LLM fine-tuning tutorial][ref-mlp-llm-fine-tuning-tutorial] can be directly carried over to a Jupyter cell with a `%%bash` header (to run its contents interpreted by bash). For `torchrun`, one can adapt the command from the multi-node [nanotron tutorial][ref-mlp-llm-nanotron-tutorial] to run on a single GH200 node using the following line in a Jupyter cell
202+
A popular approach to run multi-GPU ML workloads is with [`accelerate`](https://github.com/huggingface/accelerate) and [`torchrun`](https://docs.pytorch.org/docs/stable/elastic/run.html) as demonstrated in the [tutorials][ref-tutorials-ml].
203+
In particular, the `accelerate launch` script in the [LLM fine-tuning tutorial][software-ml-llm-fine-tuning-tutorial] can be directly carried over to a Jupyter cell with a `%%bash` header (to run its contents interpreted by bash).
204+
For `torchrun`, one can adapt the command from the multi-node [nanotron tutorial][software-ml-llm-nanotron-tutorial] to run on a single GH200 node using the following line in a Jupyter cell
203205

204206
```bash
205207
!python -m torch.distributed.run --standalone --nproc_per_node=4 run_train.py ...

0 commit comments

Comments
 (0)