-
Notifications
You must be signed in to change notification settings - Fork 203
mitogen: first_stage: Break the while loop in case of EOF #1389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note: I did not test whether this actually fixes the problem since I did not reproduce the problem, but it's very likely as I had similar issues in the past. |
|
Question: Should we raise an error if not all data could be read? Update: What speaks against raising an error is that it would be different to the "original code" behavior (before #1307) The original code reads: fp=os.fdopen(0,'rb')
C=zlib.decompress(fp.read(PREAMBLE_COMPRESSED_LEN))
fp.close()
So even with the old version it was possible that less than expected data was read and no exception was raised, so we should probably keep to that behavior for now. [1] https://docs.python.org/3/library/io.html#io.BufferedIOBase.read |
fac1020 to
aa7d95b
Compare
The call to |
mitogen/parent.py
Outdated
| C=''.encode() | ||
| while int(sys.argv[3])-len(C)and select.select([0],[],[]):C+=os.read(0,int(sys.argv[3])-len(C)) | ||
| n=int(sys.argv[3]);C=''.encode();V='V' | ||
| while n>len(C) and V:select.select([0],[],[]);V=os.read(0,n-len(C));C+=V |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fd 0 may be in non-blocking mode, therefore a zero-length read does not necessarily indicate EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't we get an BlockingIOError [1] in that case? At least the documentation of os.read [2] reads:
Return a bytestring containing the bytes read. If the end of the file referred to by fd has been reached, an empty bytes object is returned.
[1] https://docs.python.org/3/library/exceptions.html#BlockingIOError
[2] https://docs.python.org/3/library/os.html#os.read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we use select.select(...) in order to avoid that situation, no? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should I have said None. On Python 3 f.read(n)
may either raise BlockingIOError or return None if no data is available. io implementations return None.
-- https://docs.python.org/3/library/io.html#io.BufferedIOBase.read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but we use os.read(...) not the BufferedIO read (e.g. open(...).read()).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And instead of doing the SIGALRM trick we could use the timeout parameter of select [1] and check if a FD is ready to read or not... but that results in a little bit more of code.
Currently, we know that we can read data (even if it's the EOF) from one FD when the select call returns as we only pass one FD to the select function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fp.read() <--- that's the bufferedIO read != os.read(..)
and the original code used BufferedIO as os.fdopen is more or less an alias for open(...) [1] :)
fp=os.fdopen(0,'rb') C=zlib.decompress(fp.read(PREAMBLE_COMPRESSED_LEN))
|
@moreati I've added timeout handling, but I'm not sure which kind of exception should be raised in case of a timeout (currently, I've Edit: Maybe even make the timeout value configurable. |
62dff70 to
ae77274
Compare
I've added some preliminary steps to make it configurable and added a test for the timeout situation. |
|
By all means continue experimenting if you wish to, but be aware I'm unlikely to accept many new features into this chunk of code - particularly if they increase the size. My preference is to keep the first stage as minimal is possible. To measure the size (after encoding and compression) use https://github.com/mitogen-hq/mitogen/blob/master/preamble_size.py. I'm currently doing my experiments to pin down how |
|
Yep, right now I'm experimenting, especially in regard with the tests. Because it would be great if we have minimal tests for blocking and non-blocking STDIO. So the timeout test should be useful in all cases even for your SIGALRM solution :) |
Fine with me :) Reason why I'm thinking about making the timeout configurable is that the halting problem is undecidable and the timeout depends on the system/environment.
Will add the diff to all commits then :) Do we care more about the SSH command size? preamble? Or mitogen.parent? Or all of them?
Good! |
ae77274 to
91d8c74
Compare
One before and after for a complete PR will be fine. SSH command size is what I minimize. |
|
@moreati I've now added two test cases, one that causes an timeout and the other that causes the EOF situation similar to the original bug. |
f5eddd8 to
2dc337d
Compare
|
Based on moreati/nonblock_lab@c4938b9
|
|
@moreati I'm going to add multiple other test cases, e.g. e.g. Do you have other ideas? Not sure if the file case if even needed... Edit: To me it seems like a good idea to add more test cases to |
7f3d46b to
da7b213
Compare
|
@moreati That's the command line I'm using during development: |
dfb4447 to
7bcdd55
Compare
cb54e0b to
ef626af
Compare
|
@moreati
IMO, as soon as the test cases are stable and the small coding style nits are being resolved, it might be a good idea to merge the tests, together with the EOF file fix (but without my timeout fix/changes). What do you think? [1] 85d6046 |
|
I'm still getting my head around things. So I'm not sure how to proceed yet
I think your PR, or something evolved from it, is the best direction - particularly your tests. |
|
@moreati There is still one unstable test left: For 'some reasons' I do not get the error -5. I can change the expectation, but I would like to be able to understand and reproduce the issue on my local system. |
ce370f3 to
efd09f8
Compare
|
I've begun trying variations of your tests in #1392, I'm finishing for the day now but tomorrow or later this week would you like to do a call (e.g. Jitsi, Zoom, Teams, ...) to compare notes and look at failing tests? |
Signed-off-by: Marc Hartmayer <[email protected]>
This makes it easier to add more tests and the test description is now used by the test runner. Signed-off-by: Marc Hartmayer <[email protected]>
3897dd2 to
21ebeea
Compare
Sure, just send me an email. I've added more tests and improved the tests, so most scenarios should now be covered. |
10410be to
3312430
Compare
+ test_stdin_non_blocking + test_stdin_blocking Signed-off-by: Marc Hartmayer <[email protected]>
Signed-off-by: Marc Hartmayer <[email protected]>
The current implementation can cause an infinite loop, leading to a process that hangs and consumes 100% CPU. This occurs because the EOF condition is not handled properly, resulting in repeated select(...) and read(...) calls. The fix is to properly handle the EOF condition and break out of the loop when it occurs. -SSH command size: 822 +SSH command size: 838 -mitogen.parent 98746 96.4KiB 51215 50.0KiB 51.9% 12922 12.6KiB 13.1% +mitogen.parent 98827 96.5KiB 51219 50.0KiB 51.8% 12942 12.6KiB 13.1% Fixes: mitogen-hq#1348 Signed-off-by: Marc Hartmayer <[email protected]>
Signed-off-by: Marc Hartmayer <[email protected]>
Do not wait/block forever for data to be read.
Add a test for this.
The test can be run using the following command:
PYTHONPATH=$(pwd)/tests:$PYTHONPATH python -m unittest -v tests.first_stage_test
-SSH command size: 838
+SSH command size: 894
Original Minimized Compressed
-mitogen.parent 98827 96.5KiB 51219 50.0KiB 51.8% 12942 12.6KiB 13.1%
+mitogen.parent 99034 96.7KiB 51295 50.1KiB 51.8% 12970 12.7KiB 13.1%
Signed-off-by: Marc Hartmayer <[email protected]>
-SSH command size: 894
+SSH command size: 905
Original Minimized Compressed
-mitogen.parent 99034 96.7KiB 51295 50.1KiB 51.8% 12970 12.7KiB 13.1%
+mitogen.parent 99212 96.9KiB 51385 50.2KiB 51.8% 12999 12.7KiB 13.1%
Signed-off-by: Marc Hartmayer <[email protected]>
Signed-off-by: Marc Hartmayer <[email protected]>
Signed-off-by: Marc Hartmayer <[email protected]>
Bail out if STDIN or STDOUT is closed/not available as it is used for the communication with the parent process. Signed-off-by: Marc Hartmayer <[email protected]>
Signed-off-by: Marc Hartmayer <[email protected]>
If STDERR is not available, ignore the OSError since it's a non-critical error.
Note: This change is not necessary as the exception message would be print on
stderr and stderr is already closed and the exit status of the forked child
process is not checked yet.
Signed-off-by: Marc Hartmayer <[email protected]>
3312430 to
2e4821b
Compare
|
@moreati I've polished the commits/tests a little so they should be stable on all environments we support (I did not test with Windows Subsystem for Linux (WSL)). The whole series results in the following size changes: Update: Covered test scenarios:
|
|
Thanks for all this. When you're ready I'll do a full review. |
The current implementation can cause an infinite loop, leading to a process that hangs and consumes 100% CPU. This occurs because the EOF condition is not handled properly, resulting in repeated select(...) and read(...) calls.
The fix is to properly handle the EOF condition and break out of the loop when it occurs.
Fixes: #1348