eth-cscs
diff --git a/‎docs/alps/vclusters.md‎ renamed to ‎docs/alps/clusters.md‎ b/‎docs/alps/vclusters.md‎ renamed to ‎docs/alps/clusters.md‎
diff --git a/‎docs/build-install/uenv.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/build-install/uenv.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/vclusters/bristen.md‎ renamed to ‎docs/clusters/bristen.md‎ b/‎docs/vclusters/bristen.md‎ renamed to ‎docs/clusters/bristen.md‎
diff --git a/‎docs/vclusters/clariden.md‎ renamed to ‎docs/clusters/clariden.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/vclusters/clariden.md‎ renamed to ‎docs/clusters/clariden.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/vclusters/daint.md‎ renamed to ‎docs/clusters/daint.md‎ b/‎docs/vclusters/daint.md‎ renamed to ‎docs/clusters/daint.md‎
diff --git a/‎docs/vclusters/eiger.md‎ renamed to ‎docs/clusters/eiger.md‎ b/‎docs/vclusters/eiger.md‎ renamed to ‎docs/clusters/eiger.md‎
diff --git a/‎docs/vclusters/santis.md‎ renamed to ‎docs/clusters/santis.md‎ b/‎docs/vclusters/santis.md‎ renamed to ‎docs/clusters/santis.md‎
diff --git a/‎docs/guides/index.md‎
Lines changed: 6 additions & 0 deletions b/‎docs/guides/index.md‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/guides/storage.md‎
Lines changed: 129 additions & 0 deletions b/‎docs/guides/storage.md‎
Lines changed: 129 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 28 additions & 26 deletions b/‎docs/index.md‎
Lines changed: 28 additions & 26 deletions
@@ -1,6 +1,6 @@
 Uenv are user environments that provide scientific applications, libraries and tools on Alps. This article use them to build software.
 
-For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation][ref-tool-uenv].
+For more documentation on how to find, download and use uenv in your workflow, see the [env tool documentation][ref-uenv].
 
 [](){#ref-building-uenv-spack}
 ## Building software using Spack
 
@@ -79,7 +79,7 @@ Users are encouraged to use containers on Clariden.
 * Jobs using containers can be easily set up and submitted using the [container engine][ref-container-engine].
 * To build images, see the [guide to building container images on Alps][ref-build-containers].
 
-Alternatively, [uenv][ref-tool-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].
+Alternatively, [uenv][ref-uenv] are also available on Clariden. Currently the only uenv that is deployed on Clariden is [prgenv-gnu][ref-uenv-prgenv-gnu].
 
 ??? example "using uenv provided for other clusters"
     You can run uenv that were built for other Alps clusters using the `@` notation.
 
@@ -0,0 +1,6 @@
+[](){#ref-guides}
+# Guides
+
+Documentation that provides best practices, practical tips, known problems and useful background information.
+
+The guides are grouped around top-level topics
@@ -0,0 +1,129 @@
+[](){#ref-guides-storage}
+# Storage
+
+## Many small files vs. HPC File Systems
+
+Workloads that read or create many small files are not well-suited to parallel file systems, which are designed for parallel and distributed I/O.
+
+Workloads that do not play nicely with Lustre include:
+
+* Configuration and compiling applications.
+* Using Python virtual environments
+
+At first it can seem strange that a "high-performance" file system is significantly slower than a laptop drive for a "simple" task like compilation or loading Python modules, however Lustre is designed for high-bandwidth parallel file access from many nodes at the same time, with the attendant trade offs this implies.
+
+Meta data lookups on Lustre are expensive compared to your laptop, where the local file system is able to agressively cache meta data.
+
+### Python virtual environments with uenv
+
+Python virtual environments can be very slow on Lustre, for example a simple `import numpy` command run on Lustre might take seconds, compared to milliseconds on your laptop.
+
+The main reasons for this include:
+
+* Python virtual environments contain many small files, on which Python performs `stat()`, `open()` and `read()` commands when loading a module.
+* Python pre-compiles `.pyc` files for each `.py` file in a project.
+* All of these operations create a lot of meta-data lookups.
+
+As a result, using virtual environments can be slow, and these problems are only exacerbated when the virtual environment is loaded simultaneously by many ranks in an MPI job.
+
+One solution is to use the tool `mksquashfs` to compresses the contents of a directory - files, inodes and sub-directories - into a single file.
+This file can be mounted as a read-only file [Squashfs](https://en.wikipedia.org/wiki/SquashFS) file system, which is much faster because a single file is accessed instead of the many small files that were in the original environment.
+
+
+#### Step 1: create the virtual environment
+
+The first step is to create the virtual environment using the usual workflow.
+This might be slow, because we are not optimising this stage for file system performance.
+
+```bash
+# for the example create a working path on SCRATCH
+mkdir $SCRATCH/sqfs-demo
+cd $SCRATCH/sqfs-demo
+
+# start the uenv
+# in this case the "default" view of prgenv-gnu provides python, cray-mpich, and
+# other useful tools
+uenv start prgenv-gnu/24.11:v1 --view=default
+
+# create and activate the empty venv
+python -m venv ./.pyenv
+source ./.pyenv/bin/activate
+
+# install software in the virtual environment
+# in this case we install install pytorch
+pip install torch torchvision torchaudio \
+    --index-url https://download.pytorch.org/whl/cu126
+```
+
+??? example "how many files did that create?"
+    An inode is created for every file, directory and symlink on a file system.
+    In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).
+
+    The following command can be used to count the number of inodes:
+    ```
+    find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l
+    ```
+    `find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes.
+
+    In our "simple" pytorch example, I counted **22806 inodes**!
+
+#### Step 2: make a squashfs image of the virtual environment
+
+The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path.
+
+This is performed using the `mksquashfs` command, that is installed on all Alps clusters.
+
+```
+mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
+    -no-recovery -noappend -Xcompression-level 3
+```
+
+!!! hint
+    The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
+    We find that level 3 provides a good trade off between the size of the compressed image and performance: both [uenv][ref-uenv] and the [container-engine][ref-container-engine] use level 3.
+
+??? warning "I am seeing errors of the form `Unrecognised xattr prefix...`"
+    You can safely ignore the (possibly many) warning messages of the form:
+    ```
+    Unrecognised xattr prefix lustre.lov
+    Unrecognised xattr prefix system.posix_acl_access
+    Unrecognised xattr prefix lustre.lov
+    Unrecognised xattr prefix system.posix_acl_default
+    ```
+
+!!! tip
+    The default installed version of `mksquashfs` on Alps does not support the best `zstd` compression method.
+    Every uenv contains a better version of `mksquashfs`, which is used by the uenv to compress itself when it is built.
+
+    The exact location inside the uenv depends on the target architecure, and version, and will be of the form:
+    ```
+    /user-environment/linux-sles15-${arch}/gcc-7.5.0/squashfs-${version}-${hash}/bin/mksquashfs
+    ```
+    Use this version for the best results, though it is also perfectly fine to use the system version.
+
+#### Step 3: use the squashfs
+
+To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.
+
+```
+cd $SCRATCH/sqfs-demo
+uenv start --view=default \
+    prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv
+cd $SCRATCH/sqfs-demo
+source .pyenv/bin/activate
+```
+
+Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
+
+A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy, and mounted from there.
+
+#### Step 4: (optional) regenerate the virtual environment
+
+The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted.
+This means that it is not possible to `pip install` more packages in the virtual environment.
+
+If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image.
+
+!!! hint
+    If you save the updated copy in a different file, you can now "roll" back to the old version of the environment by mounting the old image.
+
@@ -7,59 +7,61 @@
     [:octicons-arrow-right-24: status.cscs.ch](https://status.cscs.ch/)
 </div>
 
-Start here to get access to CSCS services and Alps
+The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities
 
 <div class="grid cards" markdown>
 
--   :fontawesome-solid-layer-group: __Accounts and Projects__
+-   :fontawesome-solid-layer-group: __Platforms__
 
-    The first step is to get an account and a project
+    Projects at CSCS are granted access to [clusters][ref-alps-clusters], which are managed by platforms.
+    Start by finding the platform for the cluster that you want to use.
 
-    [:octicons-arrow-right-24: Accounts and Projects][ref-account-management]
+    [:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]
 
--   :fontawesome-solid-key: __Logging In__
+    Go straight to the documentation for the platform that hosts your project:
 
-    Once you have an account, you can set up multi factor authentification
+    [:octicons-arrow-right-24: HPC Platform (Daint, Eiger)][ref-platform-hpcp]
 
-    [:octicons-arrow-right-24: Setting up MFA][ref-mfa]
+    [:octicons-arrow-right-24: Machine Learning Platform (Clariden)][ref-platform-mlp]
 
-    Then access CSCS services
+    [:octicons-arrow-right-24: Climate and Weather Platform (Santis)][ref-platform-cwp]
 
-    [:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]
+-   :fontawesome-solid-mountain-sun: __Alps__
 
-    [:octicons-arrow-right-24: Using SSH][ref-ssh]
+    Learn more about the Alps research infrastructure
 
-</div>
+    [:octicons-arrow-right-24: Alps Overview](alps/index.md)
 
-The Alps Research infrastructure hosts multiple platforms and clusters targeting different communities
+    Get detailed information about the main components of the infrastructre
 
-<div class="grid cards" markdown>
+    [:octicons-arrow-right-24: Alps Clusters](alps/clusters.md)
 
--   :fontawesome-solid-layer-group: __Platforms__
+    [:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)
 
-    Once you have a project at CSCS, start here to find your platform:
+    [:octicons-arrow-right-24: Alps Storage](alps/storage.md)
 
-    [:octicons-arrow-right-24: Platforms overview][ref-alps-platforms]
+</div>
 
-    Go straight to the documentation for the platform that hosts your project:
 
-    [:octicons-arrow-right-24: HPC Platform][ref-platform-hpcp]
+<div class="grid cards" markdown>
 
-    [:octicons-arrow-right-24: Machine Learning Platform][ref-platform-mlp]
+-   :fontawesome-solid-layer-group: __Accounts and Projects__
 
-    [:octicons-arrow-right-24: Climate and Weather Platform][ref-platform-cwp]
+    The first step is to get an account and a project
 
--   :fontawesome-solid-mountain-sun: __Alps__
+    [:octicons-arrow-right-24: Accounts and Projects][ref-account-management]
 
-    Learn more about the Alps research infrastructure
+-   :fontawesome-solid-key: __Logging In__
 
-    [:octicons-arrow-right-24: Alps Overview](alps/index.md)
+    Once you have an account, you can set up multi factor authentification
 
-    Get detailed information about the main components of the infrastructre
+    [:octicons-arrow-right-24: Setting up MFA][ref-mfa]
 
-    [:octicons-arrow-right-24: Alps Hardware](alps/hardware.md)
+    Then access CSCS services
 
-    [:octicons-arrow-right-24: Alps Storage](alps/storage.md)
+    [:octicons-arrow-right-24: Accessing CSCS Web Services][ref-access-web]
+
+    [:octicons-arrow-right-24: Using SSH][ref-ssh]
 
 </div>