Skip to content

Conversation

@moreati
Copy link
Member

@moreati moreati commented Jul 21, 2025

When /etc/sudoers has log_output (or similar) enabled the process spawned by ctx.sudo() via mitogen.parent.Connection.start_child() receives a stdin that is in non-blocking mode. The immediate symptom is that os.openfd(0, ...).read(n) sometimes returns None, causing the first stage to raise an unhandled TypeError.

The fix (for now) is to use select.select() in as while loop to read stdin. This increases the command size slightly, but I think it's a reasonable tradeoff until/unless the cause is more fully understood.

All CI tests are now run with sudoers log_output enabled, in order to catch regressions. first_stage_test.CommandLineTest has been amended, because it relied on implementation details of the bootstrap process that are no longer true.

Fixes #1306, thanks to @Forgetyk for their detailed error reporting, diagnosis, and initial PR #1299

Preamble size

Before

SSH command size: 755
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed
mitogen.core         152218 148.7KiB  68437 66.8KiB 45.0%  18124 17.7KiB 11.9%
mitogen.parent        98853  96.5KiB  51103 49.9KiB 51.7%  12881 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%

After

SSH command size: 798
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed     
mitogen.core         152218 148.7KiB  68437 66.8KiB 45.0%  18124 17.7KiB 11.9%
mitogen.parent        98944  96.6KiB  51180 50.0KiB 51.7%  12910 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%

@moreati
Copy link
Member Author

moreati commented Jul 21, 2025

Nothing in the test suite is currently making the first stage stdin non-blocking, based on 0a247a3 passing in https://github.com/mitogen-hq/mitogen/actions/runs/16418222551.

@moreati moreati force-pushed the issue1306-investigate branch from 0a247a3 to 39b78df Compare July 21, 2025 13:57
@moreati
Copy link
Member Author

moreati commented Jul 21, 2025

Adding "Default log_output" to /etc/sudoers didn't reproduce it (based on 39b78df and https://github.com/mitogen-hq/mitogen/actions/runs/16419018956

@moreati moreati force-pushed the issue1306-investigate branch 3 times, most recently from 67c0407 to abaac0b Compare July 21, 2025 17:18
@moreati moreati force-pushed the issue1306-investigate branch 3 times, most recently from 28cfc2b to 20e4be0 Compare August 14, 2025 14:42
@moreati
Copy link
Member Author

moreati commented Aug 14, 2025

Now have a failing test in 20e4be0 to form the basis of a regression test

======================================================================
ERROR: test_local (sudo_test.SudoLogOutputTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/mitogen/mitogen/tests/sudo_test.py", line 73, in test_local
    context = self.router.sudo()
  File "/home/runner/work/mitogen/mitogen/mitogen/parent.py", line 2507, in sudo
    return self.connect(u'sudo', **kwargs)
  File "/home/runner/work/mitogen/mitogen/mitogen/parent.py", line 2452, in connect
    return self._connect(klass, **mitogen.core.Kwargs(kwargs))
  File "/home/runner/work/mitogen/mitogen/mitogen/parent.py", line 2432, in _connect
    conn.connect(context=context)
  File "/home/runner/work/mitogen/mitogen/mitogen/parent.py", line 1714, in connect
    raise self.exception
mitogen.parent.EofError: EOF on stream; last 100 lines received:
MITO000
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "<string>", line 18, in <module>
AssertionError

@moreati moreati force-pushed the issue1306-investigate branch 12 times, most recently from 978f247 to ceb95eb Compare August 15, 2025 09:04
@moreati
Copy link
Member Author

moreati commented Aug 15, 2025

Using select.select() in the first stage is currently causing the tests to hang indefinitely. I believe the culprit is test_first_stage.Commandline, it runs subprocesss.Popen(<bootstrap args>, stdin=open('/dev/null, 'r'), ...). That worked previously because reading from /dev/null returns a zero length string, which is interepreted as EOF by file.read(). It now fails because /dev/null is always ready to read, even though it will return b''. E.g.

>>> import select, time
>>> f = open('/dev/null', 'r')
>>> f.read(1000)
''
>>> import time; time.time(); select.select([f], [], []); time.time()
1755249587.856334
([<_io.TextIOWrapper name='/dev/null' mode='r' encoding='UTF-8'>], [], [])
1755249587.856464

@moreati moreati force-pushed the issue1306-investigate branch 4 times, most recently from 97e4e41 to 4ef8c9c Compare August 15, 2025 12:31
@moreati
Copy link
Member Author

moreati commented Aug 15, 2025

The first attempt passing all tests is 4ef8c9c. I switched the failing test from /dev/null to /dev/zero.

@moreati moreati force-pushed the issue1306-investigate branch 2 times, most recently from 4c0319d to c54d827 Compare August 15, 2025 16:55
@moreati moreati changed the title Investigate non-blocking stdin in first stage mitogen: Fix non-blocking IO errors in first stage of bootstrap Aug 15, 2025
@moreati moreati force-pushed the issue1306-investigate branch from c54d827 to 22c20ab Compare August 15, 2025 17:29
@moreati moreati marked this pull request as ready for review August 15, 2025 18:03
@moreati moreati force-pushed the issue1306-investigate branch from 22c20ab to 4987cf5 Compare August 19, 2025 13:51
We were not raising CalledProcessError when exit status != 0.
Prep for reusing it in non-Ansible tests
Before
./preamble_size.py
SSH command size: 759
Bootstrap (mitogen.core) size: 18227 (17.80KiB)

                              Original          Minimized           Compressed
mitogen.parent            98853 96.5KiB  51103 49.9KiB 51.7%  12881 12.6KiB 13.0%
mitogen.fork               8445  8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh               10827 10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo              12089 11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select            12325 12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB 7.8%
mitogen.service           41581 40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh           15767 15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master            55317 54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%

After:
SSH command size: 759
Bootstrap (mitogen.core) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed
mitogen.parent        98853  96.5KiB  51103 49.9KiB 51.7%  12881 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%
After:
SSH command size: 759
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed
mitogen.core         152218 148.7KiB  68437 66.8KiB 45.0%  18124 17.7KiB 11.9%
mitogen.parent        98853  96.5KiB  51103 49.9KiB 51.7%  12881 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%
Previously the command size could very depanding on the current username, hostname, and process pid.

Before
```
SSH command size: 759
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)
...
```

After
SSH command size: 755
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)
...
```
This is mainly for peace of mind. With all this non-blocking IO investigation
I'm getting a bit paranoid wrt file objects.

refs mitogen-hq#712
When /etc/sudoers has log_output (or similar) enabled the process spawned by
`ctx.sudo()` via `mitogen.parent.Connection.start_child()` receives a stdin
that is in non-blocking mode. The immediate symptom is that `os.openfd(0,
...).read(n)` sometimes returns `None`, causing the first stage to raise an
unhandled TypeError.

The fix (for now) is to use `select.select()` in a while loop to read stdin.
This increases the command size slightly, but I think it's a reasonable
tradeoff until/unless the cause is more fully understood.

All CI tests are now run with sudoers log_output enabled, in order to catch
regressions. `first_stage_test.CommandLineTest` has been amended, because it
relied on implementation details of the bootstrap process that are no longer
true.

Before
```
SSH command size: 755
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed
mitogen.core         152218 148.7KiB  68437 66.8KiB 45.0%  18124 17.7KiB 11.9%
mitogen.parent        98853  96.5KiB  51103 49.9KiB 51.7%  12881 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%
```

After
```
SSH command size: 798
Preamble (mitogen.core + econtext) size: 18227 (17.80KiB)

                        Original           Minimized           Compressed
mitogen.core         152218 148.7KiB  68437 66.8KiB 45.0%  18124 17.7KiB 11.9%
mitogen.parent        98944  96.6KiB  51180 50.0KiB 51.7%  12910 12.6KiB 13.0%
mitogen.fork           8445   8.2KiB   4139  4.0KiB 49.0%   1652  1.6KiB 19.6%
mitogen.ssh           10827  10.6KiB   6893  6.7KiB 63.7%   2099  2.0KiB 19.4%
mitogen.sudo          12089  11.8KiB   5924  5.8KiB 49.0%   2249  2.2KiB 18.6%
mitogen.select        12325  12.0KiB   2929  2.9KiB 23.8%    964  0.9KiB  7.8%
mitogen.service       41581  40.6KiB  22398 21.9KiB 53.9%   5847  5.7KiB 14.1%
mitogen.fakessh       15767  15.4KiB   8149  8.0KiB 51.7%   2676  2.6KiB 17.0%
mitogen.master        55317  54.0KiB  28846 28.2KiB 52.1%   7528  7.4KiB 13.6%
```
@moreati moreati force-pushed the issue1306-investigate branch from 4987cf5 to 85d6046 Compare August 20, 2025 22:48
@moreati
Copy link
Member Author

moreati commented Aug 20, 2025

Rebased on 0.3.28dev

@moreati moreati merged commit 3656979 into mitogen-hq:master Aug 20, 2025
26 checks passed
@moreati moreati deleted the issue1306-investigate branch August 20, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mitogen + sudo + log_stdin/log_stdout leads to preamble read error in first stage (EAGAIN, EWOULDBLOCK, None)

1 participant