Skip to content

dev: cannot replay or record using setup=vllm #3523

@mattf

Description

@mattf

System Info

n/a

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode live --pattern test_text_chat_completion_non_streaming
...
====== 2 passed, 95 deselected, 6 warnings in 2.86s ======
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode record --pattern test_text_chat_completion_non_streaming
...
++ exit 1

Error logs

INFO     2025-09-23 10:29:19,216 llama_stack.core.stack:334 core: Inference recording enabled: mode=record                                            
INFO     2025-09-23 10:29:21,451 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid   
         concurrency issues                                                                                                                           
INFO     2025-09-23 10:29:23,490 llama_stack.providers.utils.inference.model_registry:138 providers::utils: check_model_availability is not           
         implemented for ModelRegistryHelper. Returning False by default.                                                                             
ERROR    2025-09-23 10:29:23,502 __main__:592 core::server: Error creating app: object async_generator can't be used in 'await' expression            
Exception ignored in: <generator object inference_recording at 0x7f52558c3d30>
Traceback (most recent call last):
  File "/home/matt/Documents/Repositories/meta-llama/llama-stack/llama_stack/testing/inference_recorder.py", line 500, in inference_recording
  File "/home/matt/Documents/Repositories/meta-llama/llama-stack/llama_stack/testing/inference_recorder.py", line 454, in unpatch_inference_clients
ImportError: sys.meta_path is None, Python is likely shutting down
++ error_handler 119
++ echo 'Error occurred in script at line: 119'
Error occurred in script at line: 119
++ exit 1

Expected behavior

...
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01] 
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.043s
PASSED [ 50%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02] PASSED [100%]

====================================================== slowest 10 durations =======================================================
1.68s call     tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]
1.41s call     tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02]
1.14s setup    tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]

(3 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================== 2 passed, 95 deselected, 6 warnings in 4.32s ===========================================
...

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions