Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/build-install/uenv.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Uenv are user environments that provide scientific applications, libraries and tools on Alps. This article use them to build software.

For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation](../tools/uenv.md).
For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation][ref-uenv].

[](){#ref-building-uenv-spack}
## Building software using Spack
Expand Down
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/vclusters/clariden.md → docs/clusters/clariden.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Users are encouraged to use containers on Clariden.
* Jobs using containers can be easily set up and submitted using the [container engine][ref-container-engine].
* To build images, see the [guide to building container images on Alps][ref-build-containers].

Alternatively, [uenv][ref-tool-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].
Alternatively, [uenv][ref-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].

??? example "using uenv provided for other clusters"
You can run uenv that were built for other Alps clusters using the `@` notation.
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 6 additions & 0 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[](){#ref-guides}
# Guides

Documentation that provides best practices, practical tips, known problems and useful background information.

The guides are grouped around top-level topics
128 changes: 128 additions & 0 deletions docs/guides/storage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
[](){#ref-guides-storage}
# Storage

## Many small files vs. HPC File Systems

Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.

Workloads that do not play nicely with Lustre include:

* Configuration and compiling applications.
* Using Python virtual environments

At first it can seem strange that a "high-performance" file system is significantly slower than a laptop drive for a "simple" task like compilation or loading Python modules, however Lustre is designed for high-bandwidth parallel file access from many nodes at the same time, with the attendant trade offs this implies.

Meta data lookups on Lustre are expensive compared to your laptop, where the local file system is able to agressively cache meta data.

### Python virtual environments with uenv

Python virtual environments can be very slow on Lustre, for example a simple `import numpy` command run on Lustre might take seconds, compared to milliseconds on your laptop.

The main reasons for this include:

* Python virtual environments contain many small files, on which Python performs `stat()`, `open()` and `read()` commands when loading a module.
* Python pre-compiles `.pyc` files for each `.py` file in a project.
* All of these operations create a lot of meta-data lookups.

As a result, using virtual environments can be slow, and these problems are only exacerbated when the virtual environment is loaded simultaneously by many ranks in an MPI job.

One solution is to use the tool `mksquashfs` to compresses the contents of a directory - files, inodes and sub-directories - into a single file.
This file can be mounted as a read-only file [Squashfs](https://en.wikipedia.org/wiki/SquashFS) file system, which is much faster because a single file is accessed instead of the many small files that were in the original environment.


#### Step 1: create the virtual environment

The first step is to create the virtual environment using the usual workflow.
This might be slow, because we are not optimising this stage for file system performance.

```bash
# for the example create a working path on SCRATCH
mkdir $SCRATCH/sqfs-demo
cd $SCRATCH/sqfs-demo

# start the uenv
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
# and other useful tools
uenv start prgenv-gnu/24.11:v1 --view=default

# create and activate the empty venv
python -m venv ./.pyenv
source ./.pyenv/bin/activate

# install software in the virtual environment
# in this case we install install pytorch
pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/cu126
```

??? example "how many files did that create?"
An inode is created for every file, directory and symlink on a file system.
In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).

The following command can be used to count the number of inodes:
```
find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l
```
`find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes.

In our "simple" pytorch example, I counted **22806 inodes**!

#### Step 2: make a squashfs image of the virtual environment

The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path.

This is performed using the `mksquashfs` command, that is installed on all Alps clusters.

```bash
mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
-no-recovery -noappend -Xcompression-level 3
```

!!! hint
The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
We find that level 3 provides a good trade off between the size of the compressed image and performance: both [uenv][ref-uenv] and the [container-engine][ref-container-engine] use level 3.

??? warning "I am seeing errors of the form `Unrecognised xattr prefix...`"
You can safely ignore the (possibly many) warning messages of the form:
```
Unrecognised xattr prefix lustre.lov
Unrecognised xattr prefix system.posix_acl_access
Unrecognised xattr prefix lustre.lov
Unrecognised xattr prefix system.posix_acl_default
```

!!! tip
The default installed version of `mksquashfs` on Alps does not support the best `zstd` compression method.
Every uenv contains a better version of `mksquashfs`, which is used by the uenv to compress itself when it is built.

The exact location inside the uenv depends on the target architecure, and version, and will be of the form:
```
/user-environment/linux-sles15-${arch}/gcc-7.5.0/squashfs-${version}-${hash}/bin/mksquashfs
```
Use this version for the best results, though it is also perfectly fine to use the system version.

#### Step 3: use the squashfs

To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.

```bash
cd $SCRATCH/sqfs-demo
uenv start --view=default \
prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv
source .pyenv/bin/activate
```

Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.

A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.

#### Step 4: (optional) regenerate the virtual environment

The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted.
This means that it is not possible to `pip install` more packages in the virtual environment.

If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image.

!!! hint
If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image.

60 changes: 31 additions & 29 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,61 @@
[:octicons-arrow-right-24: status.cscs.ch](https://status.cscs.ch/)
</div>

Start here to get access to CSCS services and Alps
The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities

<div class="grid cards" markdown>

- :fontawesome-solid-layer-group: __Accounts and Projects__
- :fontawesome-solid-layer-group: __Platforms__

The first step is to get an account and a project
Projects at CSCS are granted access to [clusters][ref-alps-clusters], which are managed by platforms.
Start by finding the platform for the cluster that you want to use.

[:octicons-arrow-right-24: Accounts and Projects][ref-account-management]
[:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]

- :fontawesome-solid-key: __Logging In__
Go straight to the documentation for the platform that hosts your project:

Once you have an account, you can set up multi factor authentification
[:octicons-arrow-right-24: HPC Platform (Daint, Eiger)][ref-platform-hpcp]

[:octicons-arrow-right-24: Setting up MFA][ref-mfa]
[:octicons-arrow-right-24: Machine Learning Platform (Clariden)][ref-platform-mlp]

Then access CSCS services
[:octicons-arrow-right-24: Climate and Weather Platform (Santis)][ref-platform-cwp]

[:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]
- :fontawesome-solid-mountain-sun: __Alps__

[:octicons-arrow-right-24: Using SSH][ref-ssh]
Learn more about the Alps research infrastructure

</div>
[:octicons-arrow-right-24: Alps Overview](alps/index.md)

The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities
Get detailed information about the main components of the infrastructre

<div class="grid cards" markdown>
[:octicons-arrow-right-24: Alps Clusters](alps/clusters.md)

- :fontawesome-solid-layer-group: __Platforms__
[:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)

Once you have a project at CSCS, start here to find your platform:
[:octicons-arrow-right-24: Alps Storage](alps/storage.md)

[:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]
</div>

Go straight to the documentation for the platform that hosts your project:

[:octicons-arrow-right-24: HPC Platform][ref-platform-hpcp]
<div class="grid cards" markdown>

[:octicons-arrow-right-24: Machine Learning Platform][ref-platform-mlp]
- :fontawesome-solid-layer-group: __Accounts and Projects__

[:octicons-arrow-right-24: Climate and Weather Platform][ref-platform-cwp]
The first step is to get an account and a project

- :fontawesome-solid-mountain-sun: __Alps__
[:octicons-arrow-right-24: Accounts and Projects][ref-account-management]

Learn more about the Alps research infrastructure
- :fontawesome-solid-key: __Logging In__

[:octicons-arrow-right-24: Alps Overview](alps/index.md)
Once you have an account, you can set up multi factor authentification

Get detailed information about the main components of the infrastructre
[:octicons-arrow-right-24: Setting up MFA][ref-mfa]

[:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)
Then access CSCS services

[:octicons-arrow-right-24: Alps Storage](alps/storage.md)
[:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]

[:octicons-arrow-right-24: Using SSH][ref-ssh]

</div>

Expand Down Expand Up @@ -97,11 +99,11 @@ If you can't find the information that you need in the documentation, help is av
Provide some links to the "how" documentation here.

<div class="grid cards" markdown>
- :fontawesome-solid-hammer: __Tools__
- :fontawesome-solid-hammer: __Services__

CSCS provides tools and software on Alps.
CSCS provides services and software on Alps.

[:octicons-arrow-right-24: Tools Overview](tools/index.md)
[:octicons-arrow-right-24: Services Overview](services/index.md)

- :fontawesome-solid-screwdriver-wrench: __Build and Install Software__

Expand Down
Empty file added docs/running/index.md
Empty file.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
4 changes: 2 additions & 2 deletions docs/software/sciapps/cp2k.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ transition state optimization using NEB or dimer method. See [CP2K Features] for

!!! note "uenvs"

[CP2K] is provided on [ALPS][platforms-on-alps] via [uenv][ref-tool-uenv].
Please have a look at the [uenv documentation][ref-tool-uenv] for more information about uenvs and how to use them.
[CP2K] is provided on [ALPS][platforms-on-alps] via [uenv][ref-uenv].
Please have a look at the [uenv documentation][ref-uenv] for more information about uenvs and how to use them.

## Dependencies

Expand Down
2 changes: 1 addition & 1 deletion docs/software/sciapps/namd.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
!!! note "User Environments"

[NAMD] is provided on [ALPS][ref-alps-platforms] as a uenv.
Please have a look at the [uenv documentation][ref-tool-uenv] for more information about UENVs and how to use them.
Please have a look at the [uenv documentation][ref-uenv] for more information about UENVs and how to use them.

[NAMD] is provided in two flavours on [CSCS] systems:

Expand Down
14 changes: 7 additions & 7 deletions docs/tools/uenv.md → docs/software/uenv.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[](){#ref-tool-uenv}
[](){#ref-uenv}
# uenv

Uenv are user environments that provide scientific applications, libraries and tools.
Expand Down Expand Up @@ -51,7 +51,7 @@ Used to differentiate between _releases_ of a versioned uenv. Some examples of t

The name of the Alps cluster for which the uenv was built.

[](){#ref-tool-uenv-label-uarch}
[](){#ref-uenv-label-uarch}
#### `uarch`

The node type (microarchitecture) that the uenv is built for:
Expand Down Expand Up @@ -215,7 +215,7 @@ Tokens are created by CSCS, and stored on SCRATCH in a file that only users who
!!! note
Better token management is under development - tokens will be stored in a central location and be easier to use.

[](){#ref-tool-uenv-start}
[](){#ref-uenv-start}
## Starting a uenv session

The `uenv start` command will start a new shell with one or more uenv images mounted.
Expand Down Expand Up @@ -386,7 +386,7 @@ uenv images provide a full upstream Spack configuration to facilitate building y
No view needs to be loaded to use Spack, however all uenv provide a `spack` view that sets some environment variables that contain useful information like the location of the Spack configuration, and the version of Spack that was used to build the uenv.
For more information, see our guide on building software with [Spack and uenv][ref-building-uenv-spack].

[](){#ref-tool-uenv-run}
[](){#ref-uenv-run}
## Running a uenv

The `uenv run` command can be used to run an application or script in a uenv environment, and return control to the calling shell when the command has finished running.
Expand Down Expand Up @@ -443,7 +443,7 @@ The command takes two arguments:
* `name` is the name, e.g. `prgenv-gnu`, `gromacs`, `vistools`.
* `version` is a version string, e.g. `24.11`, `v1.2`, `2025-rc2`
* `system` is the CSCS cluster to build on (e.g. `daint`, `santis`, `clariden`, `eiger`)
* `uarch` is the [micro-architecture][ref-tool-uenv-label-uarch].
* `uarch` is the [micro-architecture][ref-uenv-label-uarch].

!!! example "building a uenv"
Call the
Expand Down Expand Up @@ -475,7 +475,7 @@ This makes it easy to share your uenv with other users, by giving them the name,
uenv image find service::@daint
```

[](){#ref-tool-uenv-slurm}
[](){#ref-uenv-slurm}
## SLURM integration

The environment to load can be provided directly to SLURM via three arguments:
Expand Down Expand Up @@ -552,7 +552,7 @@ it is possible to override the default uenv by passing a different `--uenv` and

* Note how the second call has access to `mpicc`, provided by `prgenv-gnu`.

[](){#ref-tool-uenv-installation}
[](){#ref-uenv-installation}
## Installing the uenv tool

The command line tool can be installed from source, if you are working on a cluster that does not have uenv installed, or if you need to test a new version.
Expand Down
Loading