Skip to content

Commit 1af13ba

Browse files
committed
finish reorg; add squashfs + pip guide
1 parent 8abe05d commit 1af13ba

File tree

14 files changed

+195
-55
lines changed

14 files changed

+195
-55
lines changed
File renamed without changes.

docs/build-install/uenv.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Uenv are user environments that provide scientific applications, libraries and tools on Alps. This article use them to build software.
22

3-
For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation][ref-tool-uenv].
3+
For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation][ref-uenv].
44

55
[](){#ref-building-uenv-spack}
66
## Building software using Spack
File renamed without changes.

docs/vclusters/clariden.md renamed to docs/clusters/clariden.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ Users are encouraged to use containers on Clariden.
7979
* Jobs using containers can be easily set up and submitted using the [container engine][ref-container-engine].
8080
* To build images, see the [guide to building container images on Alps][ref-build-containers].
8181

82-
Alternatively, [uenv][ref-tool-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].
82+
Alternatively, [uenv][ref-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].
8383

8484
??? example "using uenv provided for other clusters"
8585
You can run uenv that were built for other Alps clusters using the `@` notation.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/guides/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
[](){#ref-guides}
2+
# Guides
3+
4+
Documentation that provides best practices, practical tips, known problems and useful background information.
5+
6+
The guides are grouped around top-level topics

docs/guides/storage.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
[](){#ref-guides-storage}
2+
# Storage
3+
4+
## Many small files vs. HPC File Systems
5+
6+
Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.
7+
8+
Workloads that do not play nicely with Lustre include:
9+
10+
* Configuration and compiling applications.
11+
* Using Python virtual environments
12+
13+
At first it can seem strange that a "high-performance" file system is significantly slower than a laptop drive for a "simple" task like compilation or loading Python modules, however Lustre is designed for high-bandwidth parallel file access from many nodes at the same time, with the attendant trade offs this implies.
14+
15+
Meta data lookups on Lustre are expensive compared to your laptop, where the local file system is able to agressively cache meta data.
16+
17+
### Python virtual environments with uenv
18+
19+
Python virtual environments can be very slow on Lustre, for example a simple `import numpy` command run on Lustre might take seconds, compared to milliseconds on your laptop.
20+
21+
The main reasons for this include:
22+
23+
* Python virtual environments contain many small files, on which Python performs `stat()`, `open()` and `read()` commands when loading a module.
24+
* Python pre-compiles `.pyc` files for each `.py` file in a project.
25+
* All of these operations create a lot of meta-data lookups.
26+
27+
As a result, using virtual environments can be slow, and these problems are only exacerbated when the virtual environment is loaded simultaneously by many ranks in an MPI job.
28+
29+
One solution is to use the tool `mksquashfs` to compresses the contents of a directory - files, inodes and sub-directories - into a single file.
30+
This file can be mounted as a read-only file [Squashfs](https://en.wikipedia.org/wiki/SquashFS) file system, which is much faster because a single file is accessed instead of the many small files that were in the original environment.
31+
32+
33+
#### Step 1: create the virtual environment
34+
35+
The first step is to create the virtual environment using the usual workflow.
36+
This might be slow, because we are not optimising this stage for file system performance.
37+
38+
```bash
39+
# for the example create a working path on SCRATCH
40+
mkdir $SCRATCH/sqfs-demo
41+
cd $SCRATCH/sqfs-demo
42+
43+
# start the uenv
44+
# in this case the "default" view of prgenv-gnu provides python, cray-mpich, and
45+
# other useful tools
46+
uenv start prgenv-gnu/24.11:v1 --view=default
47+
48+
# create and activate the empty venv
49+
python -m venv ./.pyenv
50+
source ./.pyenv/bin/activate
51+
52+
# install software in the virtual environment
53+
# in this case we install install pytorch
54+
pip install torch torchvision torchaudio \
55+
--index-url https://download.pytorch.org/whl/cu126
56+
```
57+
58+
??? example "how many files did that create?"
59+
An inode is created for every file, directory and symlink on a file system.
60+
In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).
61+
62+
The following command can be used to count the number of inodes:
63+
```
64+
find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l
65+
```
66+
`find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes.
67+
68+
In our "simple" pytorch example, I counted **22806 inodes**!
69+
70+
#### Step 2: make a squashfs image of the virtual environment
71+
72+
The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path.
73+
74+
This is performed using the `mksquashfs` command, that is installed on all Alps clusters.
75+
76+
```
77+
mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
78+
-no-recovery -noappend -Xcompression-level 3
79+
```
80+
81+
!!! hint
82+
The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
83+
We find that level 3 provides a good trade off between the size of the compressed image and performance: both [uenv][ref-uenv] and the [container-engine][ref-container-engine] use level 3.
84+
85+
??? warning "I am seeing errors of the form `Unrecognised xattr prefix...`"
86+
You can safely ignore the (possibly many) warning messages of the form:
87+
```
88+
Unrecognised xattr prefix lustre.lov
89+
Unrecognised xattr prefix system.posix_acl_access
90+
Unrecognised xattr prefix lustre.lov
91+
Unrecognised xattr prefix system.posix_acl_default
92+
```
93+
94+
!!! tip
95+
The default installed version of `mksquashfs` on Alps does not support the best `zstd` compression method.
96+
Every uenv contains a better version of `mksquashfs`, which is used by the uenv to compress itself when it is built.
97+
98+
The exact location inside the uenv depends on the target architecure, and version, and will be of the form:
99+
```
100+
/user-environment/linux-sles15-${arch}/gcc-7.5.0/squashfs-${version}-${hash}/bin/mksquashfs
101+
```
102+
Use this version for the best results, though it is also perfectly fine to use the system version.
103+
104+
#### Step 3: use the squashfs
105+
106+
To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.
107+
108+
```
109+
cd $SCRATCH/sqfs-demo
110+
uenv start --view=default \
111+
prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv
112+
cd $SCRATCH/sqfs-demo
113+
source .pyenv/bin/activate
114+
```
115+
116+
Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
117+
118+
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy, and mounted from there.
119+
120+
#### Step 4: (optional) regenerate the virtual environment
121+
122+
The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted.
123+
This means that it is not possible to `pip install` more packages in the virtual environment.
124+
125+
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image.
126+
127+
!!! hint
128+
If you save the updated copy in a different file, you can now "roll" back to the old version of the environment by mounting the old image.
129+

docs/index.md

Lines changed: 28 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,61 @@
77
[:octicons-arrow-right-24: status.cscs.ch](https://status.cscs.ch/)
88
</div>
99

10-
Start here to get access to CSCS services and Alps
10+
The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities
1111

1212
<div class="grid cards" markdown>
1313

14-
- :fontawesome-solid-layer-group: __Accounts and Projects__
14+
- :fontawesome-solid-layer-group: __Platforms__
1515

16-
The first step is to get an account and a project
16+
Projects at CSCS are granted access to [clusters][ref-alps-clusters], which are managed by platforms.
17+
Start by finding the platform for the cluster that you want to use.
1718

18-
[:octicons-arrow-right-24: Accounts and Projects][ref-account-management]
19+
[:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]
1920

20-
- :fontawesome-solid-key: __Logging In__
21+
Go straight to the documentation for the platform that hosts your project:
2122

22-
Once you have an account, you can set up multi factor authentification
23+
[:octicons-arrow-right-24: HPC Platform (Daint, Eiger)][ref-platform-hpcp]
2324

24-
[:octicons-arrow-right-24: Setting up MFA][ref-mfa]
25+
[:octicons-arrow-right-24: Machine Learning Platform (Clariden)][ref-platform-mlp]
2526

26-
Then access CSCS services
27+
[:octicons-arrow-right-24: Climate and Weather Platform (Santis)][ref-platform-cwp]
2728

28-
[:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]
29+
- :fontawesome-solid-mountain-sun: __Alps__
2930

30-
[:octicons-arrow-right-24: Using SSH][ref-ssh]
31+
Learn more about the Alps research infrastructure
3132

32-
</div>
33+
[:octicons-arrow-right-24: Alps Overview](alps/index.md)
3334

34-
The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities
35+
Get detailed information about the main components of the infrastructre
3536

36-
<div class="grid cards" markdown>
37+
[:octicons-arrow-right-24: Alps Clusters](alps/clusters.md)
3738

38-
- :fontawesome-solid-layer-group: __Platforms__
39+
[:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)
3940

40-
Once you have a project at CSCS, start here to find your platform:
41+
[:octicons-arrow-right-24: Alps Storage](alps/storage.md)
4142

42-
[:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]
43+
</div>
4344

44-
Go straight to the documentation for the platform that hosts your project:
4545

46-
[:octicons-arrow-right-24: HPC Platform][ref-platform-hpcp]
46+
<div class="grid cards" markdown>
4747

48-
[:octicons-arrow-right-24: Machine Learning Platform][ref-platform-mlp]
48+
- :fontawesome-solid-layer-group: __Accounts and Projects__
4949

50-
[:octicons-arrow-right-24: Climate and Weather Platform][ref-platform-cwp]
50+
The first step is to get an account and a project
5151

52-
- :fontawesome-solid-mountain-sun: __Alps__
52+
[:octicons-arrow-right-24: Accounts and Projects][ref-account-management]
5353

54-
Learn more about the Alps research infrastructure
54+
- :fontawesome-solid-key: __Logging In__
5555

56-
[:octicons-arrow-right-24: Alps Overview](alps/index.md)
56+
Once you have an account, you can set up multi factor authentification
5757

58-
Get detailed information about the main components of the infrastructre
58+
[:octicons-arrow-right-24: Setting up MFA][ref-mfa]
5959

60-
[:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)
60+
Then access CSCS services
6161

62-
[:octicons-arrow-right-24: Alps Storage](alps/storage.md)
62+
[:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]
63+
64+
[:octicons-arrow-right-24: Using SSH][ref-ssh]
6365

6466
</div>
6567

0 commit comments

Comments
 (0)