Skip to content

Conversation

bollwyvl
Copy link
Contributor

Investigating downstream errors:

Seeing if this test fail is spurious/azure/cosmic rays:

=================================== FAILURES ===================================
________________________________ test_shutdown _________________________________

    def test_shutdown():
        """Kernel exits after polite shutdown_request"""
        with new_kernel() as kc:
            km = kc.parent
            execute('a = 1', kc=kc)
            wait_for_idle(kc)
            kc.shutdown()
            for i in range(300): # 30s timeout
                if km.is_alive():
                    time.sleep(.1)
                else:
                    break
>           assert not km.is_alive()
E           assert not True
E            +  where True = <bound method KernelManager.is_alive of <jupyter_client.manager.KernelManager object at 0x000055773df9b0c0>>()
E            +    where <bound method KernelManager.is_alive of <jupyter_client.manager.KernelManager object at 0x000055773df9b0c0>> = <jupyter_client.manager.KernelManager object at 0x000055773df9b0c0>.is_alive

../_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/site-packages/ipykernel/tests/test_kernel.py:373: AssertionError

from conda-forge/ipykernel-feedstock#84 (comment)

bollwyvl and others added 5 commits August 9, 2021 10:07
the test as-written is sensitive to garbage collection, which is different on pypy

instead, test only that we create a valid tracker
@minrk minrk force-pushed the pypy3-test branch 3 times, most recently from 783882d to 6b04978 Compare August 9, 2021 08:23
wait for the expected output to appear, instead of assuming 1 second is enough

this should be both faster (where sleep(1) was enough) and more reliable
@minrk
Copy link
Member

minrk commented Aug 9, 2021

There does appear to be a real failure and real difference in behavior with respect to signalling child processes of the kernel.

We use killpg to terminate the process group of the kernel. This is what's being tested in the failing test_signal_kernel_subprocesses. On pypy, we are getting the behavior of only signalling the kernel itself, whereas on CPython the child processes also get the signal (as intended).

I managed to reproduce it with this test:

test_signal_subprocesses.py
import os
import signal
import sys
import time
from subprocess import Popen

def child():
    print(f"child waiting: {os.getpid()}")
    try:
        time.sleep(10)
    except KeyboardInterrupt:
        print(f"child interrupted: {os.getpid()}")
        sys.exit(-2)

def parent():
    children = []
    for i in range(2):
        p = Popen(['bash', '-i', '-c', 'sleep 10'])
        children.append(p)

    print(f"parent waiting: {os.getpid()}")
    try:
        time.sleep(10)
    except KeyboardInterrupt:
        print(f"parent interrupted: {os.getpid()}")

    for child in children:
        child.wait()
        print(f"child {child.pid} status: {child.poll()}")


def test():
    p = Popen([sys.executable, __file__, 'parent'], start_new_session=True)
    pgid = os.getpgid(p.pid)
    print(f"parent pid: {p.pid}, pgid: {pgid}")
    time.sleep(2)
    print("signalling parent")
    os.killpg(pgid, signal.SIGINT)
    p.wait()


def main(which):
    if which == 'test':
        test()
    elif which == 'parent':
        parent()
    elif which == 'child':
        child()


if __name__ == "__main__":
    if len(sys.argv) > 1:
        main(sys.argv[1])
    else:
        main("test")

where CPython interrupts the bash children, but PyPy does not. I suspect this has to do with signal handler inheritance, and CPython is clearing signal handlers while PyPy doesn't? That's a guess. Switching the subprocess to use Python instead of bash results in interrupted children, though, so I assume the Python interpreter reinitializes SIGINT handler where bash does not.

@mattip I think this should be considered a bug in PyPy's subprocess module.

Switching the subprocesses to use Python instead of bash causes the test to pass.

PyPy doesn't create subprocesses in the same way as CPython,
resulting in ignored signals when we try to interrupt with `killpg`
@mattip
Copy link

mattip commented Aug 9, 2021

I think this should be considered a bug in PyPy's subprocess module.

Thanks for tracking this down. I will try to figure out what is going on.

wait for processes to exit, not for expected result

otherwise, a timeout is forced for any incorrect result
@minrk
Copy link
Member

minrk commented Aug 9, 2021

The remaining failures (for me) are FD exhaustion, which makes me suspect that we are somewhere relying on garbage collection to clean up some zmq sockets or other. This is usually reliable on CPython, which is how we may have gotten into the situation without noticing, whereas PyPy often needs explicit gc calls. The 'right' fix is to add the necessary explicit closes on whatever resources it is that aren't being cleaned up.

@mattip
Copy link

mattip commented Aug 9, 2021

I am not sure I understand the difference in behaviour. On Ubuntu 20.04, both cpython and pypy3.7-v7.3.5 print (at the end) for test_signal_subprocesses.py

signalling parent
parent interrupted: 245302
child 245303 status: -2
child 245304 status: -2

As for closing resources: if you could add a fixture to always call for i in range(3): gc.collect() at test teardown, that should trigger a ResourceWarning when the gc gets a file or socket that still holds a resource.

@minrk
Copy link
Member

minrk commented Aug 9, 2021

Ah, maybe it's a macOS thing, then. For me, the children run to completion with pypy and exit with status 0 (identical behavior to not signalling them at all). CPython behaves as expected, though.

Good call on the gc. I'm poking around to see where I can find leftover references. I think a fixture to assert that there are no open zmq resources across tests is the right thing to do, which ought to catch this kind of issue. There could also be other issues, like subprocess pipes which might not be closed.

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Aug 9, 2021

You good folk are wizards, thank you for pushing this forward!

@minrk minrk force-pushed the pypy3-test branch 3 times, most recently from db4ee2d to 4d433eb Compare August 9, 2021 12:22
minrk added 2 commits August 9, 2021 15:06
so hangs don't run for hours

pypy needs more than 10 minutes, cpython doesn't
@Julian
Copy link

Julian commented Aug 9, 2021

There does appear to be a real failure and real difference in behavior with respect to signalling child processes of the kernel.

We use killpg to terminate the process group of the kernel. This is what's being tested in the failing test_signal_kernel_subprocesses. On pypy, we are getting the behavior of only signalling the kernel itself, whereas on CPython the child processes also get the signal (as intended).

I managed to reproduce it with this test:
test_signal_subprocesses.py

where CPython interrupts the bash children, but PyPy does not. I suspect this has to do with signal handler inheritance, and CPython is clearing signal handlers while PyPy doesn't? That's a guess. Switching the subprocess to use Python instead of bash results in interrupted children, though, so I assume the Python interpreter reinitializes SIGINT handler where bash does not.

@mattip I think this should be considered a bug in PyPy's subprocess module.

Switching the subprocesses to use Python instead of bash causes the test to pass.

(I'm coming in late here and possibly haven't read the thread carefully enough) -- but what difference should I be observing here? On Big Sur 11.4 (MBA M1 2021) I see no difference between running this on cpython3.8 and pypy3.7 (7.3.4)

@bollwyvl
Copy link
Contributor Author

bollwyvl commented Aug 9, 2021

@Julian we initially discovered some of these challenges when running the full test suite of the downstream ipykernel on conda-forge:

conda-forge/ipykernel-feedstock#84 (comment)

Unfortunately, I know little about the inner guts of pyzmq, and less about the inner guts of pypy... but do try to keep these technologies working as broadly as possible... luckily far wiser folk have appeared!

@blink1073
Copy link
Contributor

I added pypy-3.8 in #757

@mattip
Copy link

mattip commented Mar 29, 2022

I added pypy-3.8 in #757

Cool. It is passing (without coverage) in a similar time to the CPython runs. It seems this issue can be closed? Please do ping me if tests break.

@blink1073
Copy link
Contributor

It seems this issue can be closed? Please do ping me if tests break.

Sounds good, thanks @mattip

@blink1073 blink1073 closed this Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants