-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
n/a
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode live --pattern test_text_chat_completion_non_streaming
...
====== 2 passed, 95 deselected, 6 warnings in 2.86s ======
$ ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference --inference-mode record --pattern test_text_chat_completion_non_streaming
...
++ exit 1
Error logs
INFO 2025-09-23 10:29:19,216 llama_stack.core.stack:334 core: Inference recording enabled: mode=record
INFO 2025-09-23 10:29:21,451 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid
concurrency issues
INFO 2025-09-23 10:29:23,490 llama_stack.providers.utils.inference.model_registry:138 providers::utils: check_model_availability is not
implemented for ModelRegistryHelper. Returning False by default.
ERROR 2025-09-23 10:29:23,502 __main__:592 core::server: Error creating app: object async_generator can't be used in 'await' expression
Exception ignored in: <generator object inference_recording at 0x7f52558c3d30>
Traceback (most recent call last):
File "/home/matt/Documents/Repositories/meta-llama/llama-stack/llama_stack/testing/inference_recorder.py", line 500, in inference_recording
File "/home/matt/Documents/Repositories/meta-llama/llama-stack/llama_stack/testing/inference_recorder.py", line 454, in unpatch_inference_clients
ImportError: sys.meta_path is None, Python is likely shutting down
++ error_handler 119
++ echo 'Error occurred in script at line: 119'
Error occurred in script at line: 119
++ exit 1
Expected behavior
...
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]
instantiating llama_stack_client
Port 8321 is already in use, assuming server is already running...
llama_stack_client instantiated in 0.043s
PASSED [ 50%]
tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02] PASSED [100%]
====================================================== slowest 10 durations =======================================================
1.68s call tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]
1.41s call tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_02]
1.14s setup tests/integration/inference/test_text_inference.py::test_text_chat_completion_non_streaming[txt=vllm/Qwen/Qwen3-0.6B-inference:chat_completion:non_streaming_01]
(3 durations < 0.005s hidden. Use -vv to show these durations.)
========================================== 2 passed, 95 deselected, 6 warnings in 4.32s ===========================================
...
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working