Skip to content

Conversation

@DrDaveD
Copy link
Collaborator

@DrDaveD DrDaveD commented Nov 14, 2024

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Nov 14, 2024

@fwyzard @ocaisa @kpedro88 Please test this new script and let me know what you think.

@DrDaveD DrDaveD force-pushed the add-bindexec branch 2 times, most recently from 2a976a5 to c4e808b Compare November 20, 2024 21:58
@ocaisa
Copy link

ocaisa commented Mar 10, 2025

I feel bad for not trying this out yet given that I asked for it, but I do plan to come back to it. My understanding from reading the code is that this can't mount a squashfs, is that correct? (Sorry for my ignorance, I don't even know if that would be possible).

Right now a squashfs is the most likely use case we have, we expect the parallel filesystem to groan under a fully unpacked software stack.

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Mar 10, 2025

If you need to use a squashfs filesystem I think the thing to do is to use squashfuse_ll on it and then bindmount its mountpoint into /cvmfs using this script. If that works for you but is too complicated I might consider integrating that piece of complication into this script too, similar to the way that cvmfsexec supports fuse2fs mounts.

It also occurs to me that the problems you experienced with apptainer and MPI applications might reappear with just the bindexec script, depending on what the root cause was.

FYI I have some known issues with this version, in particular it doesn't work within apptainer. When I last worked on it I was running out of things to try, and since then it hasn't made it back to the top of the priority stack. I do hope to get back to it sometime however. Since you were trying to avoid apptainer anyway that particular issue probably won't bother you.

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Apr 10, 2025

I pushed an updated version. This now works for me within apptainer and nested. It places a writable overlay over everything, except for nfs filesystems, /tmp, and /var/tmp by default. If there's anything that you want to be able to do persistent writes on you need to bind them yourself from the host onto the same place with a <path>:<path> option for a given <path>.

This may be too onerous of a user interface. Maybe it needs to be redone with an "underlay" algorithm like we used to have in apptainer rather than using fuse-overlayfs, where it binds everything in from the host except for places where custom bind mounts are added. The algorithm can get kind of hairy however depending on how deep the custom bind points are added.

For now fuse-overlayfs needs to be in its PATH. I usually do

PATH=/usr/libexec/apptainer/bin:$PATH

when testing because that has a very recent version of fuse-overlayfs. It could also come from

PATH=/cvmfs/oasis.opensciencegrid.org/mis/apptainer/current/$(arch)/libexec/apptainer/bin:$PATH

@ocaisa
Copy link

ocaisa commented Apr 11, 2025

@DrDaveD The bindexec command is missing from the last force-push

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Apr 11, 2025

@DrDaveD The bindexec command is missing from the last force-push

Oops. It's there now.

@ocaisa
Copy link

ocaisa commented Apr 11, 2025

I took this for a test drive on Ubuntu, and apart from the warning

mount: /dev/shm/bindexec/overlay/tmp: wrong fs type, bad option, bad superblock on /tmp, missing codepage or helper program, or other error.

it worked very well:

ocaisa@~$ ls test
random_file

ocaisa@~$ mksquashfs $PWD/test test.sqsh
Parallel mksquashfs: Using 8 processors
Creating 4.0 filesystem on test.sqsh, block size 131072.
....

ocaisa@~$ squashfuse_ll test.sqsh chicken

ocaisa@~$ ls chicken/
random_file

ocaisa@~$ bash bindexec $PWD/chicken:/cvmfake -- ls /cvmfake
mount: /dev/shm/bindexec/overlay/tmp: wrong fs type, bad option, bad superblock on /tmp, missing codepage or helper program, or other error.
random_file

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Apr 21, 2025

@kpedro88 @fwyzard please check out the bindexec script for your use cases.

@DrDaveD DrDaveD marked this pull request as draft April 21, 2025 15:18
@ocaisa
Copy link

ocaisa commented Apr 23, 2025

I tried the same test at Barcelona Supercomputing Centre. I could find squashfuse_ll on the system but I had to download the fuse-overlayfs binary (v1.14). When I try to reproduce my previous experiment I get a

mkdir: cannot create directory '.old-root': Read-only file system
pivot_root: failed to change root from `.' to `.old-root': No such file or directory

I tried to debug this a bit via bash (set -x and a few helper commands):

+ set -e
+ pwd
/home/ub/ub686081
+ cd /dev/shm/bindexec/overlay
+ pwd
/dev/shm/bindexec/overlay
+ mount --rbind /dev/shm/bindexec/overlay /dev/shm/bindexec/overlay
+ pwd
/dev/shm/bindexec/overlay
+ cd /dev/shm/bindexec/overlay
+ pwd
/dev/shm/bindexec/overlay
+ ls /dev/shm/bindexec/overlay
afs   bin   cvmfs  etc   home  lib64  mnt  proc  run   scratch  sys  usr  xcatpost
apps  boot  dev    gpfs  lib   media  opt  root  sbin  srv      tmp  var
+ mkdir -p .old-root
mkdir: cannot create directory '.old-root': Read-only file system

I tried a few things to see if I could pin it down, but without success.

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Apr 23, 2025

fuse-overlayfs v1.14 should be the latest, but maybe just to be sure that's not the cause you could install an unprivileged version of apptainer and get fuse-overlayfs from there. What is the HPC operating system? squashfuse_ll should not be used by bindexec.

@ocaisa
Copy link

ocaisa commented Apr 29, 2025

[ub686081@alogin1 ~]$ cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="9.2 (Plow)"

I tried fuse-overlayfs from an unprivileged version of apptainer and had the same output.

The squashfuse_ll was just for my specific use case (I'm interested in bind-mounting a squashfs).

I don't know if it matters, but there are two things about the system to note:

  • It is completely offline for normal users (including login nodes)
  • /cvmfs exists but is actually an rsync to GPFS done on a different node

@ocaisa
Copy link

ocaisa commented Apr 29, 2025

I think I see what is going wrong now, / is actually nfs:

[ub686081@alogin1 ~]$ mount | grep ' on / '
10.2.101.250:/install/netboot/rhels9.2.0/x86_64/login-rhel92/rootimg on / type nfs (ro,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,acregmin=1200,acregmax=1200,acdirmin=1200,acdirmax=1200,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.2.101.250,mountvers=3,mountport=20048,mountproto=udp,fsc,local_lock=all,addr=10.2.101.250)

If I disable the check for nfs, things work:

[ub686081@alogin1 ~]$ diff bindexec bindexec.mod
136c136
<         if [[ \$TYPE = nfs* ]]; then
---
>         if [[ \$TYPE = nfsnnn* ]]; then

[ub686081@alogin1 ~]$ bash ~/bindexec $PWD/chicken:/cvmfake -- ls /cvmfake
mkdir: cannot create directory '.old-root': Read-only file system
pivot_root: failed to change root from `.' to `.old-root': No such file or directory
rmdir: failed to remove '.old-root': Read-only file system
ls: cannot access '/cvmfake': No such file or directory

[ub686081@alogin1 ~]$ bash ~/bindexec.mod $PWD/chicken:/cvmfake -- ls /cvmfake
random_file

[ub686081@alogin1 ~]$ ls chicken/
random_file

[ub686081@alogin1 ~]$ ps ax | grep squash
  84296 ?        Ssl    0:00 squashfuse_ll /home/ub/ub686081/test.sqsh /home/ub/ub686081/chicken

@ocaisa
Copy link

ocaisa commented Apr 29, 2025

On the Lumi system I also saw a failure, but the reason is more obvious:

ocaisala@uan04:~> export PATH=$PWD/apptainer/x86_64/libexec/apptainer/bin:$PATH

ocaisala@uan04:~> squashfuse_ll test.sqsh chicken

ocaisala@uan04:~> ls chicken/
random_file

ocaisala@uan04:~> bash bindexec $PWD/chicken:/cvmfake -- ls /cvmfake
unshare: unshare failed: No space left on device

ocaisala@uan04:~> cat /proc/sys/user/max_user_namespaces
0

@DrDaveD
Copy link
Collaborator Author

DrDaveD commented Apr 29, 2025

If I disable the check for nfs, things work

I saw it cause other problems though. Maybe it should only be applied for non-/ directories, what do you think?

@ocaisa
Copy link

ocaisa commented Apr 30, 2025

You know much better than me what the consequences might be. Certainly checking for / and allowing it works for me. With the script as it is right now, write access to / since that is critical in any scenario. Can .old-root be created in a guaranteed writable location instead (not sure if that would be enough)?

EDIT: Ah, I see

   •  `put_old` must be at or underneath `new_root`; that is, adding some nonnegative number of "/.." suffixes to the pathname pointed to  by put_old must yield the same directory as new_root.

so indeed, / must be allowed through. I wonder if you can be sneaky since /tmp and /dev/shm appear as

[ub686081@alogin1 ~]$ df /tmp /dev/shm
Filesystem     1K-blocks    Used Available Use% Mounted on
rw             263987692 1057648 262930044   1% /tmp
tmpfs          263987692   66704 263920988   1% /dev/shm

so strictly speaking they do seem to meet that requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bind mount cvmfs directories with singcvmfs ? Using cvmfsexec with cvmfs_shrinkwrap

2 participants