-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Closed
Labels
Description
Description
root@1cccece5c83c:~# tritonserver --model-repository ./triton/
I1216 02:12:33.621768 179 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f8380000000' with size 268435456"
I1216 02:12:33.622377 179 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864"
I1216 02:12:33.624737 179 model_lifecycle.cc:473] "loading: vllm:1"
INFO 12-16 10:12:37 [__init__.py:216] Automatically detected platform cuda.
I1216 02:12:42.654863 179 python_be.cc:2289] "TRITONBACKEND_ModelInstanceInitialize: vllm_0_0 (MODEL device 0)"
INFO 12-16 10:12:47 [__init__.py:216] Automatically detected platform cuda.
E1216 02:12:49.624646 179 model.py:248] "[vllm] Failed to start engine: NoneType: None\n"
I1216 02:12:49.624943 179 pb_stub.cc:388] "Failed to initialize Python stub: TypeError: build_async_engine_client_from_engine_args() got an unexpected keyword argument 'stat_loggers'\n\nAt:\n /usr/lib/python3.12/contextlib.py(105): __init__\n /usr/lib/python3.12/contextlib.py(334): helper\n /opt/tritonserver/backends/vllm/model.py(301): _run_llm_engine\n /usr/lib/python3.12/asyncio/events.py(103): _run\n /usr/lib/python3.12/asyncio/base_events.py(2000): _run_once\n /usr/lib/python3.12/asyncio/base_events.py(653): run_forever\n /usr/lib/python3.12/asyncio/base_events.py(691): run_until_complete\n /usr/lib/python3.12/asyncio/runners.py(126): run\n /usr/lib/python3.12/asyncio/runners.py(194): run\n /usr/lib/python3.12/threading.py(1016): run\n /usr/lib/python3.12/threading.py(1079): _bootstrap_inner\n /usr/lib/python3.12/threading.py(1032): _bootstrap\n"
E1216 02:12:50.651942 179 backend_model.cc:694] "ERROR: Failed to create instance: TypeError: build_async_engine_client_from_engine_args() got an unexpected keyword argument 'stat_loggers'\n\nAt:\n /usr/lib/python3.12/contextlib.py(105): __init__\n /usr/lib/python3.12/contextlib.py(334): helper\n /opt/tritonserver/backends/vllm/model.py(301): _run_llm_engine\n /usr/lib/python3.12/asyncio/events.py(103): _run\n /usr/lib/python3.12/asyncio/base_events.py(2000): _run_once\n /usr/lib/python3.12/asyncio/base_events.py(653): run_forever\n /usr/lib/python3.12/asyncio/base_events.py(691): run_until_complete\n /usr/lib/python3.12/asyncio/runners.py(126): run\n /usr/lib/python3.12/asyncio/runners.py(194): run\n /usr/lib/python3.12/threading.py(1016): run\n /usr/lib/python3.12/threading.py(1079): _bootstrap_inner\n /usr/lib/python3.12/threading.py(1032): _bootstrap\n"
E1216 02:12:50.652184 179 model_lifecycle.cc:654] "failed to load 'vllm' version 1: Internal: TypeError: build_async_engine_client_from_engine_args() got an unexpected keyword argument 'stat_loggers'\n\nAt:\n /usr/lib/python3.12/contextlib.py(105): __init__\n /usr/lib/python3.12/contextlib.py(334): helper\n /opt/tritonserver/backends/vllm/model.py(301): _run_llm_engine\n /usr/lib/python3.12/asyncio/events.py(103): _run\n /usr/lib/python3.12/asyncio/base_events.py(2000): _run_once\n /usr/lib/python3.12/asyncio/base_events.py(653): run_forever\n /usr/lib/python3.12/asyncio/base_events.py(691): run_until_complete\n /usr/lib/python3.12/asyncio/runners.py(126): run\n /usr/lib/python3.12/asyncio/runners.py(194): run\n /usr/lib/python3.12/threading.py(1016): run\n /usr/lib/python3.12/threading.py(1079): _bootstrap_inner\n /usr/lib/python3.12/threading.py(1032): _bootstrap\n"
I1216 02:12:50.652270 179 model_lifecycle.cc:789] "failed to load 'vllm'"
I1216 02:12:50.652435 179 server.cc:611]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I1216 02:12:50.652549 179 server.cc:638]
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.0 |
| | | 00000","default-max-batch-size":"4"}} |
| vllm | /opt/tritonserver/backends/vllm/model.py | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.0 |
| | | 00000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------+
I1216 02:12:50.652715 179 server.cc:681]
+-------+---------+----------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-------+---------+----------------------------------------------------------------------------------------------------------------------------------+
| vllm | 1 | UNAVAILABLE: Internal: TypeError: build_async_engine_client_from_engine_args() got an unexpected keyword argument 'stat_loggers' |
| | | |
| | | At: |
| | | /usr/lib/python3.12/contextlib.py(105): __init__ |
| | | /usr/lib/python3.12/contextlib.py(334): helper |
| | | /opt/tritonserver/backends/vllm/model.py(301): _run_llm_engine |
| | | /usr/lib/python3.12/asyncio/events.py(103): _run |
| | | /usr/lib/python3.12/asyncio/base_events.py(2000): _run_once |
| | | /usr/lib/python3.12/asyncio/base_events.py(653): run_forever |
| | | /usr/lib/python3.12/asyncio/base_events.py(691): run_until_complete |
| | | /usr/lib/python3.12/asyncio/runners.py(126): run |
| | | /usr/lib/python3.12/asyncio/runners.py(194): run |
| | | /usr/lib/python3.12/threading.py(1016): run |
| | | /usr/lib/python3.12/threading.py(1079): _bootstrap_inner |
| | | /usr/lib/python3.12/threading.py(1032): _bootstrap |
+-------+---------+----------------------------------------------------------------------------------------------------------------------------------+
I1216 02:12:50.892371 179 metrics.cc:890] "Collecting metrics for GPU 0: NVIDIA GeForce RTX 4090"
I1216 02:12:50.904672 179 metrics.cc:783] "Collecting CPU metrics"
I1216 02:12:50.904834 179 tritonserver.cc:2598]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.59.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory |
| | binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | ./triton/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
I1216 02:12:50.904928 179 server.cc:312] "Waiting for in-flight requests to complete."
I1216 02:12:50.904936 179 server.cc:328] "Timeout 30: Found 0 model versions that have in-flight inferences"
I1216 02:12:50.904992 179 server.cc:343] "All models are stopped, unloading models"
I1216 02:12:50.904999 179 server.cc:352] "Timeout 30: Found 0 live models and 0 in-flight non-inference requests"
error: creating server: Internal - failed to load all modelsTriton Information
tritonserver 2.59.0
Are you using the Triton container or did you build it yourself?: Yes, I used Python version 3.12 when I built it.
To Reproduce
The contents of the triton folder are the same as those in the sample.
vllm==0.11.0
CUDA Version: 12.4
Driver Version: 550.163.01
NVIDIA GeForce RTX 4090
Reactions are currently unavailable