Skip to content

Conversation

rani-pinchuk
Copy link
Contributor

@rani-pinchuk rani-pinchuk commented Jul 19, 2025

This is a small fix as suggested in #135427 - using PyErr_WriteUnraisable() instead of PyErr_Clear().

The fix includes a test which checks that the DeprecationWarning is provided when using fork or forkpty within a thread, also when running with -Werror.

@encukou

@python-cla-bot
Copy link

python-cla-bot bot commented Jul 19, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Jul 19, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Jul 19, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@rani-pinchuk
Copy link
Contributor Author

I have created a NEWS entry for this fix, but actually this change has indeed a little impact on Python users, so maybe skip news tag is more appropriate here.

@rani-pinchuk
Copy link
Contributor Author

Many tests are failing with a warning - as now the DeprecationWarning is no longer hidden (and as I unedrstand it, the tests run with the equivalent of -Werror).

Many of the failing tests do not fail where one would expect to have more than one thread running. For example, in test_uuid.testIssue8621.

When printing there the number of threads using threading.active_count() we get 1. However, when investigating further, the number of threads are actually 2 - just that one of the threads is not a python thread, and therefore not counted by threading.active_count().

Therefore, we should find a way to suppress the warning of running a fork within a thread in all the tests that fail. Petr suggests to consider using the sys.unraisablehook (see https://vstinner.github.io/sys-unraisablehook-python38.html) in all the places where the tests fail.

If anyone knows where this extra thread comes from, it will be appreciated if this is explained :-)

@gpshead
Copy link
Member

gpshead commented Jul 19, 2025

If anyone knows where this extra thread comes from, it will be appreciated if this is explained :-)

That is ironically exactly why we have this warning - threads in a process can come from everywhere including external libraries outside of our control.

I do think you're on the right track in terms of how to enable this to be something -Werror captures. As a potentially disruptive behavior change for some test suites, it isn't a bug fix - but it'd make sense in 3.15. Having a news entry is indeed appropriate.

using sys.unraisablehook in these parts of the test suites is an interesting idea.

@rhettinger rhettinger removed their request for review July 21, 2025 06:06
@rani-pinchuk rani-pinchuk requested a review from vsajip as a code owner July 21, 2025 15:50
message=".*fork.*may lead to deadlocks in the child.*",
category=DeprecationWarning)
with warnings_helper.ignore_fork_in_thread_deprecation_warnings():
super().setUpClass()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you do this in the other instances as well? (4 in this file & one in test_concurrent_futures/util.py?)

@encukou
Copy link
Member

encukou commented Aug 19, 2025

@rani-pinchuk, I think this is close to the finish line. Would you like me to fix up the remaining issue?

@rani-pinchuk
Copy link
Contributor Author

rani-pinchuk commented Aug 19, 2025 via email

@encukou encukou added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 22, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit baa8446 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136796%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 22, 2025
@encukou encukou added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 26, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit b08021d 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F136796%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Aug 26, 2025
@encukou encukou merged commit fd8f42d into python:main Aug 26, 2025
114 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot ARM Raspbian Linux Asan 3.x (no tier) has failed when building commit fd8f42d.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1811/builds/156) and take a look at the build logs.
  4. Check if the failure is related to this commit (fd8f42d) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1811/builds/156

Summary of the results of the build (if available):

Click to see traceback logs
remote: Enumerating objects: 39, done.        
remote: Counting objects:   2% (1/36)        
remote: Counting objects:   5% (2/36)        
remote: Counting objects:   8% (3/36)        
remote: Counting objects:  11% (4/36)        
remote: Counting objects:  13% (5/36)        
remote: Counting objects:  16% (6/36)        
remote: Counting objects:  19% (7/36)        
remote: Counting objects:  22% (8/36)        
remote: Counting objects:  25% (9/36)        
remote: Counting objects:  27% (10/36)        
remote: Counting objects:  30% (11/36)        
remote: Counting objects:  33% (12/36)        
remote: Counting objects:  36% (13/36)        
remote: Counting objects:  38% (14/36)        
remote: Counting objects:  41% (15/36)        
remote: Counting objects:  44% (16/36)        
remote: Counting objects:  47% (17/36)        
remote: Counting objects:  50% (18/36)        
remote: Counting objects:  52% (19/36)        
remote: Counting objects:  55% (20/36)        
remote: Counting objects:  58% (21/36)        
remote: Counting objects:  61% (22/36)        
remote: Counting objects:  63% (23/36)        
remote: Counting objects:  66% (24/36)        
remote: Counting objects:  69% (25/36)        
remote: Counting objects:  72% (26/36)        
remote: Counting objects:  75% (27/36)        
remote: Counting objects:  77% (28/36)        
remote: Counting objects:  80% (29/36)        
remote: Counting objects:  83% (30/36)        
remote: Counting objects:  86% (31/36)        
remote: Counting objects:  88% (32/36)        
remote: Counting objects:  91% (33/36)        
remote: Counting objects:  94% (34/36)        
remote: Counting objects:  97% (35/36)        
remote: Counting objects: 100% (36/36)        
remote: Counting objects: 100% (36/36), done.        
remote: Compressing objects:   7% (1/13)        
remote: Compressing objects:  15% (2/13)        
remote: Compressing objects:  23% (3/13)        
remote: Compressing objects:  30% (4/13)        
remote: Compressing objects:  38% (5/13)        
remote: Compressing objects:  46% (6/13)        
remote: Compressing objects:  53% (7/13)        
remote: Compressing objects:  61% (8/13)        
remote: Compressing objects:  69% (9/13)        
remote: Compressing objects:  76% (10/13)        
remote: Compressing objects:  84% (11/13)        
remote: Compressing objects:  92% (12/13)        
remote: Compressing objects: 100% (13/13)        
remote: Compressing objects: 100% (13/13), done.        
remote: Total 39 (delta 23), reused 23 (delta 23), pack-reused 3 (from 2)        
From https://github.com/python/cpython
 * branch                    main       -> FETCH_HEAD
Note: switching to 'fd8f42d3d1038a812340c3ec3cbfc995a80c4e13'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at fd8f42d3d10 gh-135427: Fix DeprecationWarning for os.fork when run in threads with -Werror (GH-136796)
Switched to and reset branch 'main'

configure: WARNING: no system libmpdec found; falling back to pure-Python version for the decimal module

In file included from ./Include/internal/pycore_dict.h:11,
                 from Objects/typeobject.c:7:
In function ‘Py_DECREF_MORTAL’,
    inlined from ‘PyStackRef_XCLOSE’ at ./Include/internal/pycore_stackref.h:730:9,
    inlined from ‘_PyThreadState_PopCStackRef’ at ./Include/internal/pycore_stackref.h:810:5,
    inlined from ‘vectorcall_maybe’ at Objects/typeobject.c:3108:9:
./Include/internal/pycore_object.h:481:8: warning: array subscript 0 is outside array bounds of ‘PyObject[0]’ {aka ‘struct _object[]’} [-Warray-bounds]
  481 |     if (--op->ob_refcnt == 0) {
      |        ^

Timeout (0:05:00)!
Thread 0x0000007fad2ef100 [Thread-2] (most recent call first):
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py", line 1252 in _remaining_time
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py", line 2053 in _wait
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py", line 1277 in wait
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 194 in _run_process
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 299 in run_tmp_files
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 363 in _runtest
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 403 in run
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/threading.py", line 1074 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/threading.py", line 1036 in _bootstrap

Thread 0x0000007fadaff100 [Thread-1] (most recent call first):
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py", line 2042 in _wait
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/subprocess.py", line 1277 in wait
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 194 in _run_process
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 299 in run_tmp_files
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 363 in _runtest
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 403 in run
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/threading.py", line 1074 in _bootstrap_inner
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/threading.py", line 1036 in _bootstrap

Thread 0x0000007fb98276c0 [python] (most recent call first):
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/logger.py", line 47 in get_load_avg
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/logger.py", line 27 in log
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 553 in _get_result
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/run_workers.py", line 610 in run
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/main.py", line 455 in _run_tests_mp
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/main.py", line 561 in _run_tests
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/main.py", line 598 in run_tests
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/main.py", line 767 in main
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/libregrtest/main.py", line 775 in main
  File "/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/Lib/test/__main__.py", line 2 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
make: *** [Makefile:2494: buildbottest] Error 1

Cannot open file '/home/buildbot/buildarea/3.x.pablogsal-rasp.asan/build/test-results.xml' for upload

@picnixz
Copy link
Member

picnixz commented Aug 28, 2025

Shouldn't this be backported?

@duaneg
Copy link
Contributor

duaneg commented Oct 16, 2025

Hi folks, I ran into an interesting failure with a new ExecutorTest unit test following this change (#140021). After updating, I had not noticed I needed to add @warnings_helper.ignore_fork_in_thread_deprecation_warnings().

When it ran with a "fork" process pool it hung until it hit a 10m timeout and was killed. The output didn't include the deprecation warning or anything related to it and the hang seems to be caused by the forkserver process not shutting down, which is interesting as it only triggers when running the test with the "fork" start method. You can trigger the same behaviour by removing the decorator from one of the existing tests in that file, e.g. test_swallows_falsey_exceptions.

Is this expected? Removing the decorator from tests elsewhere results in an immediate failure with the warning being clearly indicated as the source of the error, so it looks to me like the executor tests specifically are problematic.

@gpshead gpshead added the needs backport to 3.14 bugs and security fixes label Oct 16, 2025
@miss-islington-app
Copy link

Thanks @rani-pinchuk for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @rani-pinchuk and @encukou, I could not cleanly backport this to 3.14 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker fd8f42d3d1038a812340c3ec3cbfc995a80c4e13 3.14

gpshead pushed a commit to gpshead/cpython that referenced this pull request Oct 16, 2025
…ds with -Werror (pythonGH-136796)

Don't ignore errors raised by `PyErr_WarnFormat` in `warn_about_fork_with_threads`
Instead, ignore the warnings in all test code that forks. (That's a lot of functions.)

In `test_support`, make `ignore_warnings` a context manager (as well as decorator),
and add a `message` argument to it.
Also add a `ignore_fork_in_thread_deprecation_warnings` helper for the deadlock-in-fork
warning.
@bedevere-app
Copy link

bedevere-app bot commented Oct 16, 2025

GH-140191 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 16, 2025
@duaneg
Copy link
Contributor

duaneg commented Oct 17, 2025

I've managed to reproduce the hang mentioned above outside of unit tests:

from concurrent.futures import ProcessPoolExecutor
import multiprocessing.forkserver
import threading

def noop():
    pass

def do_fork():
    ctx = multiprocessing.get_context('fork')
    with ProcessPoolExecutor(mp_context=ctx) as exec:
        f = exec.submit(noop)
        f.result()

def run_with_thread(func):
    barrier = threading.Barrier(2)
    t = threading.Thread(target=barrier.wait)
    t.start()
    try:
        func()
    finally:
        barrier.wait()
        t.join()

def main():
    multiprocessing.forkserver.ensure_running()
    run_with_thread(do_fork)

if __name__ == '__main__':
    main()

Prior to this change this runs without producing any output. After the change, when run normally, it prints a warning and exits. However, if run with -Werror, it prints the deprecation warning with a stack trace then hangs indefinitely.

The trouble is that os.fork throws in the parent but succeeds in the child. Since the parent exits out via an exception it does not close the pipe it opened to communicate with the child, which in turn means the child hangs while bootstrapping. The child inherits the parent's non-O_CLOEXEC file descriptors, which include the forkserver manager and resource tracker FDs. Since the child hangs indefinitely it keeps those FDs open indefinitely, which means the main process hangs on exit trying to clean them up.

I'm not sure if this should be considered a regression, or even a problem at all, but it will be a potential change in behaviour for previously "working" code. If folks here think it would be helpful I'll open a separate issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants