run.py: always log the sandbox when a command fails #3992

marc-hb · 2025-11-10T21:08:56Z

The main focus of commit 7e7a3c7 ("Don't log sandbox for every
command") was to reduce verbosity in the --debug case and it did just
that. To avoid losing the sandbox completely, that commit said "... but
still log the full sandbox if a command fails" and it did that too in
__init__.py; for both the --debug and non-debug cases.

However, what happened in run.py was a bit different: there, commit
7e7a3c7 showed the sandox on failure but only in the --debug
case!? Make run.py consistent with __init__.py and always show the
sandbox on failure in run.py, in both --debug and non-debug case.

This helps with issue #3948: hiding that bind mounts exist is especially
confusing on fatal "disk full" failures because that leaves only
misleading paths in the error message! On failure, showing such critical
sandbox information must not require --debug.

Not concealing the sandbox on failure is useful in any case, not
just #3948

Signed-off-by: Marc Herbert [email protected]

The main focus of commit 7e7a3c7 ("Don't log sandbox for every command") was to reduce verbosity in the --debug case and it did just that. To avoid losing the sandbox completely, that commit said "... but still log the full sandbox if a command fails" and it did that too in __init__.py; for both the --debug and non-debug cases. However, what happened in run.py was a bit different: there, commit 7e7a3c7 showed the sandox on failure but only in the --debug case!? Make run.py consistent with __init__.py and always show the sandbox on failure in run.py, in _both_ --debug and non-debug case. This helps with issue systemd#3948: hiding that bind mounts exist is especially confusing on fatal "disk full" failures because that leaves only misleading paths in the error message! On failure, showing such critical sandbox information must not require --debug. Not concealing the sandbox on failure is useful in any case, not just systemd#3948 Signed-off-by: Marc Herbert <[email protected]>

Add v14/v15 test to adjust option based on the mkosi version. No need to change the run_qemu.sh source code anymore. Passing --debug by default is especially useful because hides the sandbox unless --debug is uses, see systemd/mkosi#3992 Signed-off-by: Marc Herbert <[email protected]>

Reformat the previous commit as required by ruff Signed-off-by: Marc Herbert <[email protected]>

behrmann · 2025-11-15T10:49:28Z

It's really not necessary to get CI fully green. mkosi CI is very susceptible to variations on the distro side, e.g. flaky mirrors.

behrmann · 2025-11-15T11:05:55Z

So I'm of two minds of this. The sandbox commands can be really long and looking at the full command it's harder to see the needle in the haystack. For a lot of use cases seeing the actual command that failed is sufficient and showing the sandbox there would then make it harder to see what is going on.

What would you think about a more actionable error message (sandbox command omitted, rerun with --debug to see full command)?

daandemeyer · 2025-11-16T10:44:24Z

I'd prefer if we repurpose --debug-sandbox to always show the full command and then add --trace-sandbox to cover the current behavior of --debug-sandbox which runs mkosi-sandbox under strace

marc-hb · 2025-11-18T01:36:21Z

The sandbox commands can be really long and looking at the full command it's harder to see the needle in the haystack. For a lot of use cases seeing the actual command that failed is sufficient and showing the sandbox there would then make it harder to see what is going on.

That's fair, I focused on systemd-repart failures as described in #3948 but I understand the sandbox may not always matter than much.

Either way, this does print a very long, "haystack" command line. For which I thought of a dead-simple fix: simply split it across 2 lines instead! Like this:

‣ in sandbox: /bin/python3 -SI /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/tmp/tmprbny4x14/mkosi/sandbox.py --proc /proc --unsetenv TMPDIR --setenv SYSTEMD_OFFLINE 0 --unshare-net --ro-bind /usr /usr --ro-bind /opt /opt --symlink usr/bin /bin --symlink usr/sbin /sbin --symlink usr/lib /lib --symlink usr/lib64 /lib64 --dir /var/tmp --dir /var/log --unshare-ipc --ro-bind /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/tmp/tmprbny4x14/mkosi/sandbox.py /sandbox.py --dev /dev --ro-bind /etc/alternatives /etc/alternatives --ro-bind /etc/ld.so.cache /etc/ld.so.cache --setenv PATH /usr/bin:/usr/sbin --bind /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/tmp/mkosi-var-tmp-d1b415e539de44bd /var/tmp --dir /etc --dir /var --dir /tmp --dir /run --become-root --bind /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/staging /work/tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/staging --bind /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/root /buildroot --ro-bind /tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/repart-definitions /work/tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/repart-definitions --ro-bind /etc/pki/tls/certs/ca-bundle.crt /proxy.cacert
‣ "systemd-repart --empty=allow --size=auto --dry-run=no --json=pretty --no-pager --root=/buildroot --offline=yes --seed 9457e3c9-bd35-45d8-acd2-1446627bed37 /work/tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/staging/root.img.raw --empty=create --defer-partitions esp,xbootldr --generate-fstab=/etc/fstab --generate-crypttab=/etc/crypttab --definitions /work/tmp/rq_mkosi_wspaces/mkosi-workspace-90ljx_31/repart-definitions" returned non-zero exit code 1.

Haystack problem solved. That data already comes separate, so why concatenate them after all?

Remember: this is still only on failure. When the some command fails, users much, much prefer to have "non-optimal" but complete information about that command. Especially when the command that failed is buried below multiple layers of wrapper scripts in a CI far away, in an environment that is unfortunately not easy to reproduce and that they can't change much. Yes this is the worst possible case but it's not theoretical either; happens regularly.

(I just realized that 0c4a895 has later added to run.py similar code and extended the 7e7a3c7 inconsistency with __init__.py. Next time I push to this PR I will address both instances in run.py)

I'd prefer if we repurpose --debug-sandbox to always show the full command and then add --trace-sandbox to cover the current behavior of --debug-sandbox which runs mkosi-sandbox under strace

I'm not 100% sure how big of a change that would be but it sounds like a much bigger change than this small PR. Probably bigger than the time I have :-(

Value and effort considerations aside, I really think that sandbox information is too important and too relevant to be behind some --debug flag on failure.

daandemeyer · 2025-12-16T10:28:43Z

Sorry but this is just too verbose to do by default. I gave it a try and it's the difference between a two line error message and an error message that takes up half my terminal when running dnf.

I'm going to close this. I'm confident this is not the right solution.

As discussed in systemd#3948 and systemd#3992, logging commands failing in a sandbox without ever mentioning that a sandbox was in use is misleading and very time-consuming; notably because paths look real but they are not. To remove this confusion and cost, simply add one short line mentioning that a sandbox was involved when the failure happened + recommend to re-run with --debug to display bind mounts and other important information. Also separate the command and error message on different lines for better readability and re-usability. Just for the record: before commit 7e7a3c7 ("Don't log sandbox for every command"), `run.py` was logging sandbox commands every time which was too verbose and unreadable. That commit said "... but still log the full sandbox if a command fails" and it did that do in `__init__.py` - but not always in `run.py`. After that commit, the sandbox was logged by `run.py` only in --debug mode and became completely invisible otherwise. Signed-off-by: Marc Herbert <[email protected]>

marc-hb · 2025-12-16T23:17:49Z

Sorry but this is just too verbose to do by default. I gave it a try and it's the difference between a two line error message and an error message that takes up half my terminal when running dnf.

What would you think about a more actionable error message (sandbox command omitted, rerun with --debug to see full command)?

Done in new PR:

run.py: stop hiding sandbox and recommend --debug on sandbox failure #4068

marc-hb marked this pull request as ready for review November 10, 2025 21:53

marc-hb mentioned this pull request Nov 10, 2025

run_qemu.sh: pass --debug to mkosi pmem/run_qemu#230

Merged

This comment was marked as outdated.

Sign in to view

marc-hb force-pushed the log-sandbox branch from 3753002 to 4c5fb1c Compare November 10, 2025 22:38

This comment was marked as off-topic.

Sign in to view

run.py: ruff format

c2aab71

Reformat the previous commit as required by ruff Signed-off-by: Marc Herbert <[email protected]>

marc-hb force-pushed the log-sandbox branch from 4c5fb1c to c2aab71 Compare November 10, 2025 23:52

This was referenced Nov 11, 2025

Better systemd-repart logs on failure #3948

Closed

"Disk full" error when building GPT, default diskpart config #3223

Open

make_image: log systemd-repart *.conf files at the --debug level #3993

Merged

daandemeyer added the reviewed/needs-rework label Nov 16, 2025

This comment was marked as off-topic.

Sign in to view

daandemeyer closed this Dec 16, 2025

marc-hb mentioned this pull request Dec 16, 2025

run.py: stop hiding sandbox and recommend --debug on sandbox failure #4068

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run.py: always log the sandbox when a command fails #3992

run.py: always log the sandbox when a command fails #3992

Uh oh!

marc-hb commented Nov 10, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as off-topic.

behrmann commented Nov 15, 2025

Uh oh!

behrmann commented Nov 15, 2025

Uh oh!

daandemeyer commented Nov 16, 2025

Uh oh!

This comment was marked as off-topic.

marc-hb commented Nov 18, 2025

Uh oh!

daandemeyer commented Dec 16, 2025

Uh oh!

marc-hb commented Dec 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

run.py: always log the sandbox when a command fails #3992

run.py: always log the sandbox when a command fails #3992

Uh oh!

Conversation

marc-hb commented Nov 10, 2025

Uh oh!

This comment was marked as outdated.

This comment was marked as off-topic.

behrmann commented Nov 15, 2025

Uh oh!

behrmann commented Nov 15, 2025

Uh oh!

daandemeyer commented Nov 16, 2025

Uh oh!

This comment was marked as off-topic.

marc-hb commented Nov 18, 2025

Uh oh!

daandemeyer commented Dec 16, 2025

Uh oh!

marc-hb commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

marc-hb commented Dec 16, 2025 •

edited

Loading