Skip to content

Commit c4e808b

Browse files
committed
Add bindexec
1 parent 672ceed commit c4e808b

File tree

4 files changed

+240
-10
lines changed

4 files changed

+240
-10
lines changed

ChangeLog

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1+
- Add a bindexec command.
12
- Add the variable SINGCVMFS_LOGDIR to override the location of the
23
cvmfs logs.
4+
- Stop using $TMPDIR as a temporary variable name in cvmfsexec because it
5+
might be already set and exported.
36

47
cvmfsexec-4.42 - 24 September 2024
58
- Add rhel9-aarch64 and rhel9-ppc64le machine types.

README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,10 @@ do this in 4 different ways:
4242
unprivileged user namespaces enabled,
4343
this can also be used with unprivileged singularity or apptainer.
4444

45+
In addition, this package contains a related tool called
46+
[bindexec](#bindexec) which starts a new user namespace with given
47+
bind mounts added.
48+
4549
# Supported operating systems
4650

4751
Operating systems currently supported by this package are Red Hat
@@ -356,3 +360,40 @@ $ mkfs.ext3 -F -O ^has_journal -d tmp scratch.img
356360
By default the cvmfs logs are written to a top-level `log` directory, alongside
357361
the top-level `dist` directory. The variable `SINGCVMFS_LOGDIR` can be used to
358362
write them to a different directory, which will be created if it doesn't exist.
363+
364+
# bindexec
365+
366+
As a bonus, this package also includes a separate tool called `bindexec`
367+
that accepts any set of bind mounts to add into a new unprivileged user
368+
mount namespace. The usage is much like `cvmfsexec` except that instead
369+
of cvmfs repository names you give it `src:dest` pairs where `src` is a
370+
source directory or file and `dest` is a destination path. For example:
371+
372+
```
373+
$ bindexec /etc/motd:/var/lib/mydir/motd -- ls /var/lib/mydir
374+
motd
375+
```
376+
377+
Like `cvmfsexec`, if no command is supplied after `--` it runs an
378+
interactive shell.
379+
380+
Bind mounts require target destinations to exist, but if they are
381+
missing `bindexec` will automatically create them. This requires the
382+
fuse-overlayfs command to be in the PATH, although if there is demand
383+
for it a script for making that easily distributable as well will be
384+
supplied (probably through a `makedist` option).
385+
386+
Some system directories (`/proc`, `/sys`, `/dev`, and `/run`) are
387+
included as-is on top of the overlay so anything bound into those
388+
directories will not appear. In addition, any `nfs` filesystem types
389+
are automatically added on top of the overlay because they don't work
390+
properly through overlay, so no bind mounts will appear in those paths
391+
either.
392+
393+
`bindexec` always creates a new process namespace because that's the
394+
easiest way to make sure that the fuse-overlayfs process will exit when
395+
the command exits. This means that processes start over at pid 1 and no
396+
process can be seen outside of the namespace. Also because it is using
397+
an unprivileged user namespace, any files owned by anyone other than the
398+
current user will show up as being owned by `nobody` (just as it does in
399+
`cvmfsexec`).

bindexec

Lines changed: 186 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,186 @@
1+
#!/bin/bash
2+
# Add bind mounts in a user namespace and change to that space.
3+
# Requires being able to run unshare -rm and the ability to do fuse mounts
4+
# (kernel >= 4.18) and requires fuse-overlayfs.
5+
# Written by Dave Dykstra November 2024, based heavily on cvmfsexec.
6+
7+
#set -x
8+
#PS4='c$$+ '
9+
10+
VERSION=4.42
11+
12+
usage()
13+
{
14+
echo "Usage: bindexec [-v] [src:dest ...] -- [command]" >&2
15+
echo " Bind mount each src to dest in new user mount namespace and run command" >&2
16+
echo " -v: print current version and exit" >&2
17+
exit 1
18+
}
19+
20+
# needed for pivot_root
21+
PATH=$PATH:/usr/sbin
22+
23+
TMPD="$(mktemp -d /dev/shm/bindexec.XXXXXXXXXX)"
24+
trap "rm -rf $TMPD" 0 # note that trap does not carry past exec
25+
STARTFIFO=$TMPD/start
26+
WAITFIFO=$TMPD/wait
27+
mkfifo $STARTFIFO $WAITFIFO
28+
29+
# bash syntax {NAME}<&N doesn't work on older bashes such as the
30+
# version 3.2.x on macOS Big Sur, and in fact it fails with an error
31+
# message but not an error code, so test for it first to be able to
32+
# gracefully die
33+
34+
if [ -n "$({TESTX}<&0 2>&1)" ]; then
35+
echo "Cannot assign file descriptors to variables, bash version too old" >&2
36+
exit 1
37+
fi
38+
39+
# make a copy of stdin fd, for sending to the final command
40+
exec {STDINCOPYFD}<&0
41+
42+
ORIGPWD=$PWD
43+
44+
# can't use OPTIND because it can't distinguish between -- there or missing
45+
NOPTS=0
46+
while getopts "v" OPTION; do
47+
let NOPTS+=1
48+
case $OPTION in
49+
v) echo "$VERSION"
50+
exit
51+
;;
52+
\?) usage
53+
;;
54+
esac
55+
done
56+
shift $NOPTS
57+
58+
BINDS=""
59+
for ARG; do
60+
if [ "$ARG" == "--" ]; then
61+
break
62+
fi
63+
if [[ "$ARG" != *:* ]]; then
64+
echo "bindexec: $ARG does not contain a colon" >&2
65+
usage
66+
fi
67+
if [[ "$ARG" != /* ]] || [[ "$ARG" != *:/* ]]; then
68+
echo "bindexec: source or destination in $ARG do not start with \"/\"" >&2
69+
usage
70+
fi
71+
BINDS="$BINDS $ARG"
72+
shift
73+
done
74+
75+
if [ "$ARG" != "--" ]; then
76+
echo "bindexec: no double-hyphen found" >&2
77+
usage
78+
fi
79+
shift
80+
81+
ORIGUID="$(id -u)"
82+
ORIGGID="$(id -g)"
83+
84+
UNSHAREOPTS="--propagation unchanged"
85+
86+
# Note that within the HERE document, unprotected $ substitutions are
87+
# done by the surrounding shell, and \$ is within the unshare shell
88+
unshare -rm -pf $UNSHAREOPTS /bin/bash /dev/stdin "${@:-$SHELL}" <<!EOF-1!
89+
#set -x
90+
#PS4='c\$$+ '
91+
92+
# now in the first "fake root" namespace
93+
mount -t proc proc /proc
94+
mkdir -p $TMPD/upper $TMPD/work $TMPD/overlay
95+
96+
# put the bind mounts into the upper dir
97+
for BIND in $BINDS; do
98+
SRC="\${BIND%:*}"
99+
DST="\${BIND#*:}"
100+
if [ -d "\$SRC" ]; then
101+
mkdir -p $TMPD/upper\$DST
102+
elif [ -f "\$SRC" ]; then
103+
DSTDIR="\${DST%/*}"
104+
if [ "\$DST" != "\$DSTDIR" ]; then
105+
mkdir -p $TMPD/upper\$DSTDIR
106+
fi
107+
touch $TMPD/upper\$DST
108+
else
109+
echo "bindexec: \$SRC not found, skipping" >&2
110+
fi
111+
mount --bind \$SRC $TMPD/upper\$DST
112+
done
113+
114+
# Leave this bash running as PID 1, because most other
115+
# programs won't handle signals & child reaping correctly.
116+
# Note that all other processes in the namespaces will get
117+
# a SIGKILL when PID 1 exits.
118+
trap "" 1 2 3 15 # ignore all ordinary signals
119+
120+
fuse-overlayfs -o lowerdir=/,upperdir=$TMPD/upper,workdir=$TMPD/work $TMPD/overlay 2> >(grep -v lazytime >&2)
121+
# Put original system dirs on top of the overlay
122+
mount -t proc proc $TMPD/overlay/proc
123+
mount --rbind /sys $TMPD/overlay/sys
124+
mount --rbind /dev $TMPD/overlay/dev
125+
126+
# Add cvmfs on top if it is present
127+
if [ -d /cvmfs ]; then
128+
mkdir -p $TMPD/overlay/cvmfs
129+
mount --rbind /cvmfs $TMPD/overlay/cvmfs
130+
fi
131+
132+
# Also bind on top nfs mounts because they don't work through fuse-overlayfs
133+
mount|while read FROM X TO X TYPE REST; do
134+
if [[ \$TYPE = nfs* ]]; then
135+
mkdir -p $TMPD/overlay\$TO
136+
# this sometimes fails with weird bind mount combinations
137+
# under apptainer so just save the output in a variable so
138+
# it can be seen with debugging enabled
139+
MSG="\$(mount --rbind \$TO $TMPD/overlay\$TO 2>&1)"
140+
fi
141+
done
142+
143+
# Start a second fake root namespace so we don't interfere with the
144+
# fuse-overlayfs mount space when we do the pivot_root.
145+
# Quoting the HERE document's delimeter makes this nested shell not
146+
# interpret $ substitutions, but the previous one still does so
147+
# need to use \$ when don't want first shell to expand.
148+
unshare -rm $UNSHAREOPTS /bin/bash /dev/stdin "\${@:-$SHELL}" <<'!EOF-2!'
149+
#set -x
150+
#PS4='c\$$+ '
151+
152+
(
153+
# This is a background process for setting up the child's uid map
154+
trap "" 1 2 3 15 # ignore ordinary signals
155+
read PID
156+
# set up uid/gid map
157+
echo "$ORIGGID 0 1" >/proc/"\$PID"/gid_map
158+
echo "$ORIGUID 0 1" >/proc/"\$PID"/uid_map
159+
echo "ready" >$WAITFIFO
160+
) <$STARTFIFO &
161+
162+
# Change to the new root. Would use chroot but it doesn't work.
163+
mount --rbind $TMPD/overlay $TMPD/overlay # pivot_root requires this
164+
cd $TMPD/overlay
165+
mkdir -p .old-root
166+
pivot_root . .old-root
167+
cd $ORIGPWD
168+
169+
# Finally, start the user namespace with the original uid/gid
170+
# This HERE document is also quoted and so the shell does not expand
171+
exec unshare -U $UNSHAREOPTS /bin/bash /dev/stdin "\${@:-$SHELL}" <<'!EOF-3!'
172+
#set -x
173+
#PS4='c\$$+ '
174+
175+
# now in the user namespace
176+
177+
echo "\$$" >$STARTFIFO
178+
# wait for the uid/gid maps to be set up
179+
read X <$WAITFIFO
180+
181+
exec "\$@" <&$STDINCOPYFD $STDINCOPYFD<&-
182+
!EOF-3!
183+
184+
!EOF-2!
185+
186+
!EOF-1!

cvmfsexec

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,13 @@ elif [ "$MAJORKERN" -eq 3 -a "$MINORKERN" -eq 10 -a "$REVKERN" -ge 1127 ]; then
3939
USERFUSE=true
4040
fi
4141

42-
TMPDIR=$(mktemp -d)
43-
trap "rm -rf $TMPDIR" 0 # note that trap does not carry past exec
44-
CMDFIFO1=$TMPDIR/cmd1
45-
WAITFIFO1=$TMPDIR/wait1
46-
CMDFIFO2=$TMPDIR/cmd2
47-
WAITFIFO2=$TMPDIR/wait2
48-
FUNCS=$TMPDIR/funcs
42+
TMPD=$(mktemp -d)
43+
trap "rm -rf $TMPD" 0 # note that trap does not carry past exec
44+
CMDFIFO1=$TMPD/cmd1
45+
WAITFIFO1=$TMPD/wait1
46+
CMDFIFO2=$TMPD/cmd2
47+
WAITFIFO2=$TMPD/wait2
48+
FUNCS=$TMPD/funcs
4949

5050
# create the fifos used for interprocess communication
5151
mkfifo $CMDFIFO1 $WAITFIFO1 $CMDFIFO2 $WAITFIFO2
@@ -238,7 +238,7 @@ else
238238
fi
239239
./umountrepo $REPO >/dev/null
240240
done
241-
rm -rf $TMPDIR
241+
rm -rf $TMPD
242242
) &
243243
fi
244244

@@ -252,7 +252,7 @@ unshare -rm $UNSHAREOPTS /bin/bash /dev/stdin "${@:-$SHELL}" <<!EOF-1!
252252
#set -x
253253
#PS4='c\$$+ '
254254
# now in the "fakeroot" namespace
255-
trap "rm -rf $TMPDIR" 0 # note that this does not carry through "exec"
255+
trap "rm -rf $TMPD" 0 # note that this does not carry through "exec"
256256
257257
mkdir -p $HERE/mnt
258258
mount --rbind $HERE/mnt $HERE/mnt # pivot_root requires this mountpoint
@@ -411,7 +411,7 @@ unshare -rm $UNSHAREOPTS /bin/bash /dev/stdin "${@:-$SHELL}" <<!EOF-1!
411411
# processes in the namespaces will get a SIGKILL when
412412
# PID 1 exits.
413413
EXEC=""
414-
trap "rm -rf $TMPDIR" 0
414+
trap "rm -rf $TMPD" 0
415415
trap "" 1 2 3 15 # ignore all ordinary signals
416416
else
417417
EXEC=exec

0 commit comments

Comments
 (0)