Podman missinterprets %h symbol in the bind volume source when container created/started from a unit (BTRFS file system) #12253
Replies: 33 comments 4 replies
-
Sorry! It may be unrelated because it happens regardless of the volume type. Simply, podman run and Podman container create return err code 125 when executed by rootless user, i.e. systemd unit on my Fedora WS in the user session, like in ##2197. Then, cid file is not created , ............. stop and rm don't work. It never happens in when podman runs in the terminal, only when from the unit using systemctl --user start, daemon-reexec, etc. Hint: Podman REST API service is running in the background. |
Beta Was this translation helpful? Give feedback.
-
@vrothberg PTAL |
Beta Was this translation helpful? Give feedback.
-
@giuseppe I think that it is exactly the ##2172 because the main difference between working case onWSL distro i.e. ext4 filesystem and failing Fedora34 WS i.e. btrfs filesystem. But I expected that in 3.1 the configuration file should be corrected. Please, look at attachement. |
Beta Was this translation helpful? Give feedback.
-
could you share the systemd service file generated by Podman? |
Beta Was this translation helpful? Give feedback.
-
The recent working in rootless WSL version for Theia IDE with Podman
backend enabled, slightly modified to reference project workspace in the
user's home and starting Chrome in the kiosk mode.
Can be used as a test scenario.I use a pre-pooled Image ID to avoid pull
delay and .interaction with the user that is impossible inside systemd
units.
Everything works in WSL!!!
…On Tue, Sep 14, 2021 at 11:34 AM Giuseppe Scrivano ***@***.***> wrote:
could you share the systemd service file generated by Podman?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#11547 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM7YVKFB3JL5HUPYRXNQW5DUB4CK3ANCNFSM5D5MP75A>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
Beta Was this translation helpful? Give feedback.
-
The root cause is certainly related to the file system type: Fedora34 uses BTRFS by the default mounted at /home. If user data is located at /home too but not necessary on the btrfs subvolume the cross-fs mount will never work. The generated Unit checks that the Podman tree and container storage exist but it doesn't check or trigger mounting of subvolume or validate that user's data are in the same FS as container storage. The same problem appears in the anonymous volume scenario - nothing enforces that everything is on the same volume. storage.conf allows everything. WSL VM with its single ext4 filesystem works perfectly: no btrfs and Host folders are mounted using MS-specific mechanism. How bind mount should work on BTRFS and subvolumes??? |
Beta Was this translation helpful? Give feedback.
-
@PavelSosin-320, please share the systemd service file. |
Beta Was this translation helpful? Give feedback.
-
From the Podman on WSL instance with some comments and TODOs |
Beta Was this translation helpful? Give feedback.
-
Thanks, can you also the contents of run-r8df511c2cb034c33a1ec70d63b670529.service? |
Beta Was this translation helpful? Give feedback.
-
Sorry for the long delay due to the holidays. I tried to run theia image via systemd-run as a transient unit and the result is: |
Beta Was this translation helpful? Give feedback.
-
Since all possible scenarios that I tested including Image volumes and anonymous volumes worked OK I suppose the root cause is that Podman parses -v option value exactly as described in the Documentation:
|
Beta Was this translation helpful? Give feedback.
-
@vwbusguy Unfortunately, now I can say definitely that the issue is related to the btrfs filesystem because exactly the same syntax works correctly on the ext4 filesystem of the WSL-Fedora VM instance but doesn't work in the Fedora 34 Desktop with its default btrfs FS. Since BTRFS itself is a userspace FS and has its own Kernel module and mount utilities it can conflict with the FUSE-mount. In the Docker's documentation, this issue is addressed explicitly in Docker doc BTRFS storage driver. Although Podman info on Fedora reports that the backing FS is btrfs all other configurations look like the same as on ext4 FS. |
Beta Was this translation helpful? Give feedback.
-
After eliminating BTRF-related issues via Podman with BTRFS I found very simple thing: When Podman tries to create a container from the systemd-unit that ran by rootless user it can't find storage configuration and it causes the "invalid reference" error. The testing using systemd-run results in |
Beta Was this translation helpful? Give feedback.
-
I see the dependency on btrfs storage driver: it creates every container as a subvolume (!!!) in the $HOME/.local/share/containers/storage/btrfs/subvolumes/. So, the real ruunroot for the rootless containers has to be adjusted. It would better to use %h in the runroot option because HOME has to be imported into systemd environment according to systemctl --user value. It doesn't happen automatically. |
Beta Was this translation helpful? Give feedback.
-
Indeed, it does, and that's generally not a problem for btrfs, unless you want to try to fsck all of them at once for some reason. Otherwise, subvols in btrfs are cheap. But yeah, the problem is that systemd won't grok the default PID file location and will assume the container isn't healthy and running when it is and will continuously restart it (depending on container/service restart policy) after a minute or so. Oddly enough, just commenting out the PIDFile line in the service file seems to make it work just fine, but I haven't tried this with a bunch of different container services on one host. |
Beta Was this translation helpful? Give feedback.
-
How would it know to use btrfs driver vs overlay if storage.conf is ignored? |
Beta Was this translation helpful? Give feedback.
-
To eliminate "Invalid reference" message I after learning lessons from running "Podman container create" using systemd-run,
|
Beta Was this translation helpful? Give feedback.
-
Hint: something went in the Docker's BTRFS driver too: #moby/moby#42253. Interesting what https://github.com/AkihiroSuda did here. Podman only describes failure in the wrong way. Indeed, some operations like subvolume create, show don't need root privileges but in the some cases /home subvolume is not accessible for the rootless user. The simple ls , read, and write as rootless user into /home..... subvolume work without mount ????? Does mount fail without FUSE outside User session? Maybe, Mount namespaces of conmon and created by systemd conflict? Systemd Unit created for Pod wthout conmon and --new option works OK. The Pod provides its own CG as a parent for inner containers. |
Beta Was this translation helpful? Give feedback.
-
Finally, I suppose that I hit the ##4678 . This is 1.5 years old issue without solution. Only workaround was proposed. But it looks very similar to the starting Podman REST API zombie process issue. Systemd can't tolerate non-organized processes packs. Otherwise, The systemd based systemd will be filled with zombie processes and leaked CGs. Systemd tends to organize a group of processes and If long-living CG is needed Podman.scope under the user.slice managed by logind can be used. It creates CGs with predictable namesI I played with it to get rid zombie REST API server process and it worked well - everything that belong to the scope disappears. |
Beta Was this translation helpful? Give feedback.
-
Please open a PR to fix this in containers/storage. |
Beta Was this translation helpful? Give feedback.
-
@rhatdan The scope creation is purely crun duty. I don't think that "external" transient scope creation using systemd-run can be used in the production. I just upgraded crun for Fedora 34 to the recent version 1.2 for Fedora 34 and will test it as soon as possible to be sure that it works correctly. But meanwhile, can somebody from the Podman team check that Podman invoke Crun correctly with --rootless and --systemd-cgroup options values and then, processes exit code properly. |
Beta Was this translation helpful? Give feedback.
-
Crun has been tested and exposes the same very old issue of cgroup manager for rootless user: Podman has to follow containers.conf configuration and use systemd as cgroup manager for root and rootless user. to manage container's running as a systemd service the system unit of kind scope either Systemd manager DBUS API or manually executed systemd-run is absolutely necessary. busctl works for rootless users, every user has own bus socket - there is no reason to suspect that API has some additional restrictions. |
Beta Was this translation helpful? Give feedback.
-
are you using BTRFS as the storage backend or is the storage configured to use overlay? |
Beta Was this translation helpful? Give feedback.
-
@giuseppe 1. Crun is not guilty!!! I reverted the Podman configuration to the old runc and got the same result - error 125/n/a. |
Beta Was this translation helpful? Give feedback.
-
I am curious to know if this works when using overlay instead of btrs. Have you tried changing the storage driver? |
Beta Was this translation helpful? Give feedback.
-
I've also had this happen with overlay.
|
Beta Was this translation helpful? Give feedback.
-
I experienced some strange adverse effects as a result of Fedora update brought systemd, DBus upgrade with their utilities upgrade. They expect some environment variables and access rights in the scope of Systemd service: |
Beta Was this translation helpful? Give feedback.
-
Playing with RunC vs CRun I found that RunC has strong requirement that RunRoot where the bundle is stored must be on the TMPFS, i.e. /Run?User/ ... But Fedora ( and the future possible WSL Fedora distro based on WinBTRFS driver boxes users inside distro's /home FS that is always BTRFS. Does somebody know how to mount bind different FS type? |
Beta Was this translation helpful? Give feedback.
-
I converted the issue to a discussion as I don't think there's one clearly identifiable bug/issue yet. |
Beta Was this translation helpful? Give feedback.
-
The main point that systemd doesn't know how to manage services using ContainerId. It know only how to manage processes by Process ID. Regardless, how service has been started it can be shutdown only using KILL - podman container kill. If Podman-generated unit reports the top container process as PID ( run, start, init, ) etc. ) > PID-File it must work: ExecStop=podman container $cid kill. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
When container that assumes bind volume mounting is created or run from the generated by podman generate systemd command the source path can be expressed by the absolute path or via % builtin symbols, %h for example. All % builtin "path" symbols like %t and %h are expanded into an absolute path, i.e. must be accepted by podman as a valid bind volume source. All subdirectories related to the %h, %v, etc. should be accepted. I hope, local driver and fuse.mount support this scenario.
Steps to reproduce the issue:
Describe the results you received:
"Invalid reference" error message
Describe the results you expected:
It must work because systemd unit must be shareable between users
Additional information you deem important (e.g. issue happens only occasionally):
It works OK in WSL because the workspace is located in the host filesystem
Output of
podman version
:Podman Version 3.3.1
API Version: 3.3.1
(paste your output here)
(paste your output here)
Beta Was this translation helpful? Give feedback.
All reactions