Skip to content

Commit 0016de9

Browse files
committed
Use tabs for the uv/venv installation process
1 parent 3526b02 commit 0016de9

File tree

1 file changed

+88
-57
lines changed

1 file changed

+88
-57
lines changed

docs/guides/storage.md

Lines changed: 88 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -34,70 +34,88 @@ This file can be mounted as a read-only [Squashfs](https://en.wikipedia.org/wiki
3434

3535
The first step is to create the virtual environment using the usual workflow.
3636

37-
```bash
38-
# for the example create a working path on SCRATCH
39-
mkdir $SCRATCH/sqfs-demo
40-
cd $SCRATCH/sqfs-demo
41-
42-
# start the uenv
43-
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
44-
# and other useful tools
45-
uenv start prgenv-gnu/24.11:v1 --view=default
46-
47-
# create and activate the empty venv
48-
python -m venv ./.pyenv
49-
source ./.pyenv/bin/activate
50-
51-
# install software in the virtual environment
52-
# in this case we install install pytorch
53-
pip install torch torchvision torchaudio \
54-
--index-url https://download.pytorch.org/whl/cu126
55-
```
37+
=== "uv"
38+
39+
The recommended way to create a new virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects.
40+
41+
```bash
42+
# start the uenv
43+
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
44+
# and other useful tools
45+
uenv start prgenv-gnu/24.11:v1 --view=default
46+
47+
# create and activate a new relocatable venv using uv
48+
# in this case we explicitly select python 3.12
49+
uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv
50+
cd /dev/shm/sqfs-demo
51+
source .venv/bin/activate
52+
53+
# install software in the virtual environment using uv
54+
# in this case we install install pytorch
55+
uv pip install --link-mode=copy torch torchvision torchaudio \
56+
--index-url https://download.pytorch.org/whl/cu126
57+
58+
# optionally, to reduce the import times, precompile all
59+
# python modules to bytecode before creating the squashfs image
60+
python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages
61+
```
62+
63+
=== "venv"
64+
65+
A new virtual environment can also be created using the standard `venv` module. However, virtual environments created by `venv` are not relocatable, and thus they need to be created and initialized in the path from where they will be used. This implies that the installation process can not be optimized for file system performance and will still be slow on Lustre filesystems.
66+
67+
```bash
68+
# start the uenv
69+
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
70+
# and other useful tools
71+
uenv start prgenv-gnu/24.11:v1 --view=default
72+
73+
# for the example create a working path on SCRATCH
74+
mkdir $SCRATCH/sqfs-demo
75+
cd $SCRATCH/sqfs-demo
76+
77+
# create and activate the empty venv
78+
python -m venv ./.venv
79+
source ./.venv/bin/activate
80+
81+
# install software in the virtual environment
82+
# in this case we install install pytorch
83+
pip install torch torchvision torchaudio \
84+
--index-url https://download.pytorch.org/whl/cu126
85+
```
5686

5787
??? example "how many files did that create?"
5888
An inode is created for every file, directory and symlink on a file system.
5989
In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).
6090

6191
The following command can be used to count the number of inodes:
6292
```
63-
find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l
93+
find $SCRATCH/sqfs-demo/.venv -exec stat --format="%i" {} + | sort -u | wc -l
6494
```
6595
`find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes.
6696

6797
In our "simple" pytorch example, I counted **22806 inodes**!
6898

69-
##### Alternative virtual environment creation using uv
7099

71-
The installation process described above is not optimized for file system performance and will still be slow on Lustre filesystems. An alternative way to create the virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects.
100+
#### Step 2: make a squashfs image of the virtual environment
72101

73-
```bash
74-
# activate the uenv as before
75-
uenv start prgenv-gnu/24.11:v1 --view=default
102+
The next step is to create a single squashfs file that contains the whole virtual environment folder (i.e. `/dev/shm/sqfs-demo/.venv` or `$SCRATCH/sqfs-demo/.venv`).
76103

77-
# create and activate a new relocatable venv using uv
78-
# in this case we explicitly select python 3.12
79-
uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv
80-
cd /dev/shm/sqfs-demo
81-
source .venv/bin/activate
104+
This is performed using the `mksquashfs` command, that is installed on all Alps clusters.
82105

83-
# install software in the virtual environment using uv
84-
uv pip install --link-mode=copy torch torchvision torchaudio \
85-
--index-url https://download.pytorch.org/whl/cu126
86-
# optionally, to reduce the import times, precompile all
87-
# python modules to bytecode before creating the squashfs image
88-
python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages
89-
```
106+
=== "uv"
90107

91-
#### Step 2: make a squashfs image of the virtual environment
92-
93-
The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path.
108+
```bash
109+
mksquashfs /dev/shm/sqfs-demo/.venv py_venv.squashfs \
110+
-no-recovery -noappend -Xcompression-level 3
111+
```
94112

95-
This is performed using the `mksquashfs` command, that is installed on all Alps clusters.
113+
=== "venv"
96114

97-
```bash
98-
mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
99-
-no-recovery -noappend -Xcompression-level 3
100-
```
115+
```bash
116+
mksquashfs $SCRATCH/sqfs-demo/.venv py_venv.squashfs \
117+
-no-recovery -noappend -Xcompression-level 3
118+
```
101119

102120
!!! hint
103121
The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
@@ -126,26 +144,39 @@ mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
126144

127145
To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.
128146

129-
```bash
130-
cd $SCRATCH/sqfs-demo
131-
uenv start --view=default \
132-
prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv
133-
source .pyenv/bin/activate
134-
```
147+
=== "uv"
148+
149+
```bash
150+
cd $SCRATCH/sqfs-demo
151+
uenv start --view=default \
152+
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
153+
source .venv/bin/activate
154+
```
155+
156+
Remember that virtual environments created by `uv` are relocatable only if the `--relocatable` option flag is passed to the `uv venv` command as mentioned in step 1. In that case, the generated environment is relocatable and thus it is possible to mount it in multiple locations without problems.
157+
158+
=== "venv"
159+
160+
```bash
161+
cd $SCRATCH/sqfs-demo
162+
uenv start --view=default \
163+
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
164+
source .venv/bin/activate
165+
```
135166

136-
Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
167+
Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.venv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
137168

138-
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.
169+
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.
139170

140-
!!! warning
141-
Virtual environments are not relocatable by default as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created, unless it contains a virtual environment specifically created using a tool with support for relocatable virtual environments (e.g. `uv venv --relocatable` as mentioned in step 1), in which case it can be mounted in any location.
171+
!!! warning
172+
Virtual environments created by `venv` are not relocatable as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created.
142173

143174
#### Step 4: (optional) regenerate the virtual environment
144175

145-
The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted.
176+
The squashfs file is immutable - it is not possible to modify the contents of `.venv` while it is mounted.
146177
This means that it is not possible to `pip install` more packages in the virtual environment.
147178

148-
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image.
179+
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes to the virtual environment, and run step 2 again to generate a new image.
149180

150181
!!! hint
151182
If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image.

0 commit comments

Comments
 (0)