Skip to content

Commit 4e0f607

Browse files
kebe7junyoukaichao
andauthored
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. (#14948)
Signed-off-by: Kebe <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: youkaichao <[email protected]>
1 parent 726efc6 commit 4e0f607

File tree

3 files changed

+17
-4
lines changed

3 files changed

+17
-4
lines changed

docs/source/design/multiprocessing.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This document describes how vLLM deals with these challenges.
2424
[Python multiprocessing methods](https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods) include:
2525

2626
- `spawn` - spawn a new Python process. This will be the default as of Python
27-
3.14.
27+
3.14. In macOS, this is already the default.
2828

2929
- `fork` - Use `os.fork()` to fork the Python interpreter. This is the default
3030
in Python versions prior to 3.14.
@@ -34,7 +34,7 @@ This document describes how vLLM deals with these challenges.
3434
### Tradeoffs
3535

3636
`fork` is the fastest method, but is incompatible with dependencies that use
37-
threads.
37+
threads. If you are under macOS, using `fork` may cause the process to crash.
3838

3939
`spawn` is more compatible with dependencies, but can be problematic when vLLM
4040
is used as a library. If the consuming code does not use a `__main__` guard (`if

vllm/distributed/device_communicators/shm_broadcast.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,8 +125,13 @@ def __init__(self,
125125
lambda *args, **kwargs: None):
126126
try:
127127
self.shared_memory = shared_memory.SharedMemory(name=name)
128-
assert (
129-
self.shared_memory.size == self.total_bytes_of_buffer)
128+
# See https://docs.python.org/3/library/multiprocessing.shared_memory.html # noqa
129+
# Some platforms allocate memory based on page size,
130+
# so the shared memory block size may be larger or equal
131+
# to the requested size. The size parameter is ignored
132+
# when attaching to an existing block.
133+
assert (self.shared_memory.size
134+
>= self.total_bytes_of_buffer)
130135
except FileNotFoundError:
131136
# we might deserialize the object in a different node
132137
# in this case, this object is not used,

vllm/platforms/cpu.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# SPDX-License-Identifier: Apache-2.0
22

33
import os
4+
import sys
45
from typing import TYPE_CHECKING, Optional
56

67
import psutil
@@ -148,6 +149,13 @@ def check_and_update_config(cls, vllm_config: VllmConfig) -> None:
148149
# To hint IPEX uses shared memory based AllReduce
149150
os.environ["LOCAL_WORLD_SIZE"] = str(
150151
vllm_config.parallel_config.tensor_parallel_size)
152+
if sys.platform == "darwin" and \
153+
envs.VLLM_WORKER_MULTIPROC_METHOD == "fork":
154+
if os.environ.get('VLLM_WORKER_MULTIPROC_METHOD', None) is None:
155+
logger.warning(
156+
"Default to spawn method on MacOS. If this is not desired,"
157+
" set VLLM_WORKER_MULTIPROC_METHOD to fork explicitly.")
158+
os.environ['VLLM_WORKER_MULTIPROC_METHOD'] = 'spawn'
151159

152160
@classmethod
153161
def is_pin_memory_available(cls) -> bool:

0 commit comments

Comments
 (0)