Skip to content

Conversation

@k-artem
Copy link
Contributor

@k-artem k-artem commented Dec 22, 2025

Need to force context re-creating inside testcase otherwise runtime exception raises:

    def set_start_method(self, method, force=False):
        if self._actual_context is not None and not force:
>           raise RuntimeError('context has already been set')
E           RuntimeError: context has already been set

Command for verification:
pytest -v tests/unit/runtime/test_ds_initialize.py

Need to force context re-creating inside testcase otherwise runtime exception raises:
```
    def set_start_method(self, method, force=False):
        if self._actual_context is not None and not force:
>           raise RuntimeError('context has already been set')
E           RuntimeError: context has already been set
```

Signed-off-by: Artem Kuzmitckii <[email protected]>
@k-artem
Copy link
Contributor Author

k-artem commented Jan 5, 2026

@sfc-gh-truwase please have a look

@sfc-gh-truwase
Copy link
Collaborator

@k-artem can you share a bit about the error that this fixes? In my experience set_start_method is very flaky and environment-dependent.

@tohtana, @loadams, any thoughts?

@k-artem
Copy link
Contributor Author

k-artem commented Jan 12, 2026

@k-artem can you share a bit about the error that this fixes? In my experience set_start_method is very flaky and environment-dependent.

@tohtana, @loadams, any thoughts?

sure, on my side it’s consistently reproducible. torch.multiprocessing is a wrapper around Python’s multiprocessing, and its start method can only be set once per process via set_start_method and after that the multiprocessing context is locked and cannot be changed. However, the test calls set_start_method three times with different methods via @pytest.mark.parametrize('method', ['spawn', 'fork', 'forkserver']) , which triggers an exception after first call. In my fix, I force the multiprocessing context to be recreated between runs.

tests/unit/runtime/test_ds_initialize.py::test_start_method_safety[spawn] PASSED                                                                                                                          [ 33%]
tests/unit/runtime/test_ds_initialize.py::test_start_method_safety[fork] FAILED                                                                                                                           [ 66%]
tests/unit/runtime/test_ds_initialize.py::test_start_method_safety[forkserver] FAILED                                                                                                                     [100%]

=================================================================================================== FAILURES ====================================================================================================
________________________________________________________________________________________ test_start_method_safety[fork] _________________________________________________________________________________________

method = 'fork'

    @pytest.mark.parametrize('method', ['spawn', 'fork', 'forkserver'])
    def test_start_method_safety(method):
        import torch.multiprocessing as mp
        print(os.getpid())
>       mp.set_start_method(method)

tests/unit/runtime/test_ds_initialize.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <multiprocessing.context.DefaultContext object at 0x7620fe759280>, method = 'fork', force = False

    def set_start_method(self, method, force=False):
        if self._actual_context is not None and not force:
>           raise RuntimeError('context has already been set')
E           RuntimeError: context has already been set

/usr/lib/python3.12/multiprocessing/context.py:247: RuntimeError
--------------------------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------------------------

@k-artem
Copy link
Contributor Author

k-artem commented Jan 15, 2026

hi @sfc-gh-truwase please approve if no objections

@sfc-gh-truwase
Copy link
Collaborator

@k-artem, the approval from @loadams is actually sufficient.

@sfc-gh-truwase sfc-gh-truwase merged commit 7bbb6c7 into deepspeedai:master Jan 15, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants