Skip to content

Commit 85da491

Browse files
authored
Update storage section to explain relocatable python venvs with uv (#47)
1 parent 80adbb5 commit 85da491

File tree

1 file changed

+87
-36
lines changed

1 file changed

+87
-36
lines changed

docs/guides/storage.md

Lines changed: 87 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -33,50 +33,89 @@ This file can be mounted as a read-only [Squashfs](https://en.wikipedia.org/wiki
3333
#### Step 1: create the virtual environment
3434

3535
The first step is to create the virtual environment using the usual workflow.
36-
This might be slow, because we are not optimising this stage for file system performance.
3736

38-
```bash
39-
# for the example create a working path on SCRATCH
40-
mkdir $SCRATCH/sqfs-demo
41-
cd $SCRATCH/sqfs-demo
37+
=== "uv"
4238

43-
# start the uenv
44-
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
45-
# and other useful tools
46-
uenv start prgenv-gnu/24.11:v1 --view=default
39+
The recommended way to create a new virtual environment is to use the [uv](https://docs.astral.sh/uv/) tool, which supports _relocatable_ virtual environments and asynchronous package downloads. The main benefit of a relocatable virtual environment is that it does not need to be created in the final path from where it will be used. This allows the use of shared memory to speed up the creation and initialization of the virtual environment and, since the virtual environment can be used from any location, the resulting squashfs image can be safely shared across projects.
4740

48-
# create and activate the empty venv
49-
python -m venv ./.pyenv
50-
source ./.pyenv/bin/activate
41+
```bash
42+
# start the uenv
43+
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
44+
# and other useful tools
45+
uenv start prgenv-gnu/24.11:v1 --view=default
5146

52-
# install software in the virtual environment
53-
# in this case we install install pytorch
54-
pip install torch torchvision torchaudio \
55-
--index-url https://download.pytorch.org/whl/cu126
56-
```
47+
# create and activate a new relocatable venv using uv
48+
# in this case we explicitly select python 3.12
49+
uv venv -p 3.12 --relocatable --link-mode=copy /dev/shm/sqfs-demo/.venv
50+
cd /dev/shm/sqfs-demo
51+
source .venv/bin/activate
52+
53+
# install software in the virtual environment using uv
54+
# in this case we install install pytorch
55+
uv pip install --link-mode=copy torch torchvision torchaudio \
56+
--index-url https://download.pytorch.org/whl/cu126
57+
58+
# optionally, to reduce the import times, precompile all
59+
# python modules to bytecode before creating the squashfs image
60+
python -m compileall -j 8 -o 1 -o 2 .venv/lib/python3.12/site-packages
61+
```
62+
63+
=== "venv"
64+
65+
A new virtual environment can also be created using the standard `venv` module. However, virtual environments created by `venv` are not relocatable, and thus they need to be created and initialized in the path from where they will be used. This implies that the installation process can not be optimized for file system performance and will still be slow on Lustre filesystems.
66+
67+
```bash
68+
# start the uenv
69+
# in this case the "default" view of prgenv-gnu provides python, cray-mpich,
70+
# and other useful tools
71+
uenv start prgenv-gnu/24.11:v1 --view=default
72+
73+
# for the example create a working path on SCRATCH
74+
mkdir $SCRATCH/sqfs-demo
75+
cd $SCRATCH/sqfs-demo
76+
77+
# create and activate the empty venv
78+
python -m venv ./.venv
79+
source ./.venv/bin/activate
80+
81+
# install software in the virtual environment
82+
# in this case we install install pytorch
83+
pip install torch torchvision torchaudio \
84+
--index-url https://download.pytorch.org/whl/cu126
85+
```
5786

5887
??? example "how many files did that create?"
5988
An inode is created for every file, directory and symlink on a file system.
6089
In order to optimise performance, we want to reduce the number of inodes (i.e. the number of files and directories).
6190

6291
The following command can be used to count the number of inodes:
6392
```
64-
find $SCRATCH/sqfs-demo/.pyenv -exec stat --format="%i" {} + | sort -u | wc -l
93+
find $SCRATCH/sqfs-demo/.venv -exec stat --format="%i" {} + | sort -u | wc -l
6594
```
6695
`find` is used to list every path and file, and `stat` is called on each of these to get the inode, and then `sort` and `wc` are used to count the number of unique inodes.
6796

6897
In our "simple" pytorch example, I counted **22806 inodes**!
6998

99+
70100
#### Step 2: make a squashfs image of the virtual environment
71101

72-
The next step is to create a single squashfs file that contains the whole `$SCRATCH/sqfs-demo/.pyenv` path.
102+
The next step is to create a single squashfs file that contains the whole virtual environment folder (i.e. `/dev/shm/sqfs-demo/.venv` or `$SCRATCH/sqfs-demo/.venv`).
73103

74104
This is performed using the `mksquashfs` command, that is installed on all Alps clusters.
75105

76-
```bash
77-
mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
78-
-no-recovery -noappend -Xcompression-level 3
79-
```
106+
=== "uv"
107+
108+
```bash
109+
mksquashfs /dev/shm/sqfs-demo/.venv py_venv.squashfs \
110+
-no-recovery -noappend -Xcompression-level 3
111+
```
112+
113+
=== "venv"
114+
115+
```bash
116+
mksquashfs $SCRATCH/sqfs-demo/.venv py_venv.squashfs \
117+
-no-recovery -noappend -Xcompression-level 3
118+
```
80119

81120
!!! hint
82121
The `-Xcompression-level` flag sets the compression level to a value between 1 and 9, with 9 being the most compressed.
@@ -105,27 +144,39 @@ mksquashfs $SCRATCH/sqfs-demo/.pyenv pyenv.squashfs \
105144

106145
To use the optimised virtual environment, mount the squashfs image at the location of the original virtual environment when starting the uenv.
107146

108-
```bash
109-
cd $SCRATCH/sqfs-demo
110-
uenv start --view=default \
111-
prgenv-gnu/24.11:v1,$PWD/pyenv.squashfs:$SCRATCH/sqfs-demo/.pyenv
112-
source .pyenv/bin/activate
113-
```
147+
=== "uv"
114148

115-
Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.pyenv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
149+
```bash
150+
cd $SCRATCH/sqfs-demo
151+
uenv start --view=default \
152+
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
153+
source .venv/bin/activate
154+
```
116155

117-
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.
156+
Remember that virtual environments created by `uv` are relocatable only if the `--relocatable` option flag is passed to the `uv venv` command as mentioned in step 1. In that case, the generated environment is relocatable and thus it is possible to mount it in multiple locations without problems.
118157

119-
!!! warning
120-
Virtual environment are usually not relocatable as they contain symlinks to absolute locations inside the virtual environment. Therefore, you need to mount the image in the exact same location where you created the virtual environment.
158+
=== "venv"
159+
160+
```bash
161+
cd $SCRATCH/sqfs-demo
162+
uenv start --view=default \
163+
prgenv-gnu/24.11:v1,$PWD/py_venv.squashfs:$SCRATCH/sqfs-demo/.venv
164+
source .venv/bin/activate
165+
```
166+
167+
Note that the original virtual environment is still installed in `$SCRATCH/sqfs-demo/.venv`, however the squashfs image has been mounted on top of it, so the single squashfs file is being accessed instead of the many files in the original version.
168+
169+
A benefit of this approach is that the squashfs file can be copied to a location that is not subject to the Scratch cleaning policy.
170+
171+
!!! warning
172+
Virtual environments created by `venv` are not relocatable as they contain symlinks to absolute locations inside the virtual environment. This means that the squashfs file must be mounted in the exact same location where the virtual environment was created.
121173

122174
#### Step 4: (optional) regenerate the virtual environment
123175

124-
The squashfs file is immutable - it is not possible to modify the contents of `.pyenv` while it is mounted.
176+
The squashfs file is immutable - it is not possible to modify the contents of `.venv` while it is mounted.
125177
This means that it is not possible to `pip install` more packages in the virtual environment.
126178

127-
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes, and run step 2 again to generate a new image.
179+
If you need to modify the virtual environment, run the original uenv without the squashfs file mounted, make changes to the virtual environment, and run step 2 again to generate a new image.
128180

129181
!!! hint
130182
If you save the updated copy in a different file, you can now "roll back" to the old version of the environment by mounting the old image.
131-

0 commit comments

Comments
 (0)