Skip to content

[PR Wanted] Fix unsafe getenv pointer usage in dlopen interception, causing vLLM startup failures #22

@joeyxzy

Description

@joeyxzy

When I was trying to use Neutrino with vLLM, I encountered an issue while starting vLLM with the following command:

VLLM_USE_V1=0 neutrino -p block_sched vllm serve /home/joeyxzy/models/Qwen1.5-4B-Chat \
  --tensor-parallel-size 1 \
  --dtype auto \
  --max-model-len 4096 \
  --port 8001 \
  --enforce-eager

ERROR:

(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [fa_utils.py:64] Cannot use FA version 2 is not supported due to FA2 is unavaible due to: dynamic module does not define module export function (PyInit__vllm_fa2_C)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] EngineCore failed to start.
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] Traceback (most recent call last):
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 141, in __getattr__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     return self._fntab[name]
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]            ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] KeyError: 'LLVMPY_AddSymbol'
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] During handling of the above exception, another exception occurred:
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] Traceback (most recent call last):
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 124, in _load_lib
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     _ = self._lib_handle.LLVMPY_GetVersionInfo()
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     func = self.__getitem__(name)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     func = self._FuncPtr((name_or_ordinal, self))
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] AttributeError: /home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/neutrino/build/libcuda.so.1: undefined symbol: LLVMPY_GetVersionInfo
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] Traceback (most recent call last):
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 619, in __init__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     super().__init__(
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 103, in __init__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     self._init_executor()
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/executor/uniproc_executor.py", line 46, in _init_executor
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     self.driver_worker.init_worker(all_kwargs=[kwargs])
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/worker/worker_base.py", line 253, in init_worker
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     worker_class = resolve_obj_by_qualname(
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]                    ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/utils/import_utils.py", line 89, in resolve_obj_by_qualname
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     module = importlib.import_module(module_name)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/importlib/__init__.py", line 126, in import_module
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     return _bootstrap._gcd_import(name[level:], package, level)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/worker/gpu_worker.py", line 54, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/worker/gpu_model_runner.py", line 135, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     from vllm.v1.spec_decode.ngram_proposer import NgramProposer
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/vllm/vllm/v1/spec_decode/ngram_proposer.py", line 6, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     from numba import get_num_threads, jit, njit, prange, set_num_threads
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/numba/__init__.py", line 73, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     from numba.core import config
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/numba/core/config.py", line 17, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     import llvmlite.binding as ll
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/__init__.py", line 4, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     from .dylib import *
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/dylib.py", line 36, in <module>
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     ffi.lib.LLVMPY_AddSymbol.argtypes = [
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 144, in __getattr__
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     cfn = getattr(self._lib, name)
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]                   ^^^^^^^^^
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 136, in _lib
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     self._load_lib()
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 130, in _load_lib
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855]     raise OSError("Could not find/load shared object file") from e
(EngineCore_DP0 pid=280425) ERROR 03-09 18:21:39 [core.py:855] OSError: Could not find/load shared object file
(EngineCore_DP0 pid=280425) Process EngineCore_DP0:
(EngineCore_DP0 pid=280425) Traceback (most recent call last):
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 141, in __getattr__
(EngineCore_DP0 pid=280425)     return self._fntab[name]
(EngineCore_DP0 pid=280425)            ~~~~~~~~~~~^^^^^^
(EngineCore_DP0 pid=280425) KeyError: 'LLVMPY_AddSymbol'
(EngineCore_DP0 pid=280425)
(EngineCore_DP0 pid=280425) During handling of the above exception, another exception occurred:
(EngineCore_DP0 pid=280425)
(EngineCore_DP0 pid=280425) Traceback (most recent call last):
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 124, in _load_lib
(EngineCore_DP0 pid=280425)     _ = self._lib_handle.LLVMPY_GetVersionInfo()
(EngineCore_DP0 pid=280425)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/ctypes/__init__.py", line 389, in __getattr__
(EngineCore_DP0 pid=280425)     func = self.__getitem__(name)
(EngineCore_DP0 pid=280425)            ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/ctypes/__init__.py", line 394, in __getitem__
(EngineCore_DP0 pid=280425)     func = self._FuncPtr((name_or_ordinal, self))
(EngineCore_DP0 pid=280425)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425) AttributeError: /home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/neutrino/build/libcuda.so.1: undefined symbol: LLVMPY_GetVersionInfo
(EngineCore_DP0 pid=280425)
(EngineCore_DP0 pid=280425) The above exception was the direct cause of the following exception:
(EngineCore_DP0 pid=280425)
(EngineCore_DP0 pid=280425) Traceback (most recent call last):
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=280425)     self.run()
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=280425)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 859, in run_engine_core
(EngineCore_DP0 pid=280425)     raise e
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=280425)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=280425)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 619, in __init__
(EngineCore_DP0 pid=280425)     super().__init__(
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/engine/core.py", line 103, in __init__
(EngineCore_DP0 pid=280425)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=280425)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=280425)     self._init_executor()
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/executor/uniproc_executor.py", line 46, in _init_executor
(EngineCore_DP0 pid=280425)     self.driver_worker.init_worker(all_kwargs=[kwargs])
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/worker/worker_base.py", line 253, in init_worker
(EngineCore_DP0 pid=280425)     worker_class = resolve_obj_by_qualname(
(EngineCore_DP0 pid=280425)                    ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/utils/import_utils.py", line 89, in resolve_obj_by_qualname
(EngineCore_DP0 pid=280425)     module = importlib.import_module(module_name)
(EngineCore_DP0 pid=280425)              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/importlib/__init__.py", line 126, in import_module
(EngineCore_DP0 pid=280425)     return _bootstrap._gcd_import(name[level:], package, level)
(EngineCore_DP0 pid=280425)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap_external>", line 940, in exec_module
(EngineCore_DP0 pid=280425)   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/worker/gpu_worker.py", line 54, in <module>
(EngineCore_DP0 pid=280425)     from vllm.v1.worker.gpu_model_runner import GPUModelRunner
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/worker/gpu_model_runner.py", line 135, in <module>
(EngineCore_DP0 pid=280425)     from vllm.v1.spec_decode.ngram_proposer import NgramProposer
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/vllm/vllm/v1/spec_decode/ngram_proposer.py", line 6, in <module>
(EngineCore_DP0 pid=280425)     from numba import get_num_threads, jit, njit, prange, set_num_threads
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/numba/__init__.py", line 73, in <module>
(EngineCore_DP0 pid=280425)     from numba.core import config
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/numba/core/config.py", line 17, in <module>
(EngineCore_DP0 pid=280425)     import llvmlite.binding as ll
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/__init__.py", line 4, in <module>
(EngineCore_DP0 pid=280425)     from .dylib import *
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/dylib.py", line 36, in <module>
(EngineCore_DP0 pid=280425)     ffi.lib.LLVMPY_AddSymbol.argtypes = [
(EngineCore_DP0 pid=280425)     ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 144, in __getattr__
(EngineCore_DP0 pid=280425)     cfn = getattr(self._lib, name)
(EngineCore_DP0 pid=280425)                   ^^^^^^^^^
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 136, in _lib
(EngineCore_DP0 pid=280425)     self._load_lib()
(EngineCore_DP0 pid=280425)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/llvmlite/binding/ffi.py", line 130, in _load_lib
(EngineCore_DP0 pid=280425)     raise OSError("Could not find/load shared object file") from e
(EngineCore_DP0 pid=280425) OSError: Could not find/load shared object file
(APIServer pid=280286) Traceback (most recent call last):
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/bin/vllm", line 6, in <module>
(APIServer pid=280286)     sys.exit(main())
(APIServer pid=280286)              ^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=280286)     args.dispatch_function(args)
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/cli/serve.py", line 59, in cmd
(APIServer pid=280286)     uvloop.run(run_server(args))
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/uvloop/__init__.py", line 92, in run
(APIServer pid=280286)     return runner.run(wrapper())
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=280286)     return self._loop.run_until_complete(task)
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=280286)     return await main
(APIServer pid=280286)            ^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/openai/api_server.py", line 2028, in run_server
(APIServer pid=280286)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/openai/api_server.py", line 2047, in run_server_worker
(APIServer pid=280286)     async with build_async_engine_client(
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=280286)     return await anext(self.gen)
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(APIServer pid=280286)     async with build_async_engine_client_from_engine_args(
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=280286)     return await anext(self.gen)
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
(APIServer pid=280286)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=280286)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/utils/func_utils.py", line 116, in inner
(APIServer pid=280286)     return fn(*args, **kwargs)
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/async_llm.py", line 203, in from_vllm_config
(APIServer pid=280286)     return cls(
(APIServer pid=280286)            ^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/async_llm.py", line 133, in __init__
(APIServer pid=280286)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=280286)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=280286)     return AsyncMPClient(*client_args)
(APIServer pid=280286)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/core_client.py", line 808, in __init__
(APIServer pid=280286)     super().__init__(
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/core_client.py", line 469, in __init__
(APIServer pid=280286)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=280286)   File "/home/joeyxzy/miniconda3/envs/vllm-cu124/lib/python3.11/contextlib.py", line 144, in __exit__
(APIServer pid=280286)     next(self.gen)
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/utils.py", line 898, in launch_core_engines
(APIServer pid=280286)     wait_for_engine_startup(
(APIServer pid=280286)   File "/home/joeyxzy/vllm/vllm/v1/engine/utils.py", line 955, in wait_for_engine_startup
(APIServer pid=280286)     raise RuntimeError(
(APIServer pid=280286) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

The most critical error message was:

AttributeError: .../site-packages/neutrino/build/libcuda.so.1: undefined symbol: LLVMPY_GetVersionInfo

At first glance, this suggested that llvmlite was trying to load its own dynamic library, but the request was incorrectly redirected to Neutrino’s libcuda.so.1. That led us to suspect an issue in the dlopen interception logic.

After investigating, we confirmed that in many cases the environment variable NEUTRINO_DRIVER_NAME is an empty string here:

// in preload.c
void* dlopen(const char *filename, int flags) {
    // original (GLIBC) dlopen still exists in search space 
    // but is less prefered as LD_PRELOAD mask it
    // using dlsym with RTLD_NEXT we can extract GLIBC dlopen.
    if (!real_dlopen) 
        real_dlopen = dlsym(RTLD_NEXT, "dlopen");
    
    if (!NEUTRINO_DRIVER_NAME) {
        NEUTRINO_DRIVER_NAME = getenv("NEUTRINO_DRIVER_NAME");
        // fprintf(stderr, "[info] NEUTRINO_DRIVER_NAME: %s\n", NEUTRINO_DRIVER_NAME);
    }   

    if (filename != NULL && (strstr(filename, NEUTRINO_DRIVER_NAME) != NULL)) {
        ...

The problem is that this string matching is not sufficiently defensive. When NEUTRINO_DRIVER_NAME is empty, the strstr condition effectively always evaluates to true, so the code enters the interception branch for almost every dlopen call. As a result, many unrelated dynamic library loads are incorrectly redirected to libcuda.so.1, which explains the error above.

After reviewing and patching the code, I found that the root cause is related to the lifetime and safety of the pointer returned by getenv(). In a complex multi-process environment like vLLM, this pointer can become invalid or unsafe to rely on over time. As documented here https://cppreference.net/c/program/getenv.html:

This function is not required to be thread-safe. Another call to getenv, as well as a call to the POSIX functions setenv(), unsetenv(), and putenv() may invalidate the pointer returned by a previous call or modify the string obtained from a previous call.

In other words, if we intend to keep using the value returned by getenv() for an extended period, we should make a persistent copy ourselves instead of storing the raw pointer directly.

To address this, I made a fix that copies and preserves the environment variable safely, which resolves this instability and prevents the unsafe interception behavior. With this change, I was able to start and run vLLM successfully.

In addition, I also noticed an issue with the use of memcpy in the string handling logic, and I included a fix for that as well.

I have already prepared a PR for these changes. Would you be open to reviewing and accepting it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions