generated from runpod-workers/worker-template
-
Notifications
You must be signed in to change notification settings - Fork 265
Open
Description
Using the OpenAI /chat/completions endpoint for transcribing audio with Voxtral-Mini-3B fails with missing vllm[audio] package.
I have deployed a vLLM serverless endpoint, with this image: registry.runpod.net/runpod-workers-worker-vllm-main-dockerfile:2becd3534
Error
[info]--- Starting Serverless Worker | Version 1.7.13 ---\n
[info]Jobs in queue: 1
[info]Jobs in progress: 1
[info]INFO 10-09 08:58:50 [chat_utils.py:473] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.\n
[info]tekken.py :540 2025-10-09 08:58:51,169 Vocab size: 150000\n
[info]tekken.py :544 2025-10-09 08:58:51,171 Cutting vocab to first 130072 tokens.\n
[warning]WARNING 10-09 08:58:51 [chat_utils.py:376] 'add_generation_prompt' is not supported for mistral tokenizer, so it will be ignored.\n
[warning]WARNING 10-09 08:58:51 [chat_utils.py:380] 'continue_final_message' is not supported for mistral tokenizer, so it will be ignored.\n
[error]Please install vllm[audio] for audio support
[info]Finished.
My environment variables:
MODEL_NAME=mistralai/Voxtral-Mini-3B-2507
TOKENIZER=auto
TOOL_CALL_PARSER=mistral
CONFIG_FORMAT=mistral
TOKENIZER_MODE=mistral
DISABLE_LORA=true
SKIP_TOKENIZER_INIT=false
TRUST_REMOTE_CODE=true
LOAD_FORMAT=mistral
DTYPE=auto
KV_CACHE_DTYPE=auto
MAX_MODEL_LEN=0
GUIDED_DECODING_BACKEND=outlines
DISTRIBUTED_EXECUTOR_BACKEND=ray
WORKER_USE_RAY=false
RAY_WORKERS_USE_NSIGHT=false
PIPELINE_PARALLEL_SIZE=1
TENSOR_PARALLEL_SIZE=1
MAX_PARALLEL_LOADING_WORKERS=0
ENABLE_PREFIX_CACHING=false
DISABLE_SLIDING_WINDOW=false
USE_V2_BLOCK_MANAGER=false
NUM_LOOKAHEAD_SLOTS=0
SEED=0
NUM_GPU_BLOCKS_OVERRIDE=0
MAX_NUM_BATCHED_TOKENS=0
MAX_NUM_SEQS=256
MAX_LOGPROBS=20
DISABLE_LOG_STATS=false
QUANTIZATION=None
ROPE_THETA=0
TOKENIZER_POOL_SIZE=0
TOKENIZER_POOL_TYPE=ray
ENABLE_LORA=false
MAX_LORAS=1
MAX_LORA_RANK=16
LORA_EXTRA_VOCAB_SIZE=256
LORA_DTYPE=auto
MAX_CPU_LORAS=0
FULLY_SHARDED_LORAS=false
DEVICE=auto
SCHEDULER_DELAY_FACTOR=0
ENABLE_CHUNKED_PREFILL=false
NUM_SPECULATIVE_TOKENS=0
SPECULATIVE_DRAFT_TENSOR_PARALLEL_SIZE=0
SPECULATIVE_MAX_MODEL_LEN=0
SPECULATIVE_DISABLE_BY_BATCH_SIZE=0
NGRAM_PROMPT_LOOKUP_MAX=0
NGRAM_PROMPT_LOOKUP_MIN=0
SPEC_DECODING_ACCEPTANCE_METHOD=rejection_sampler
TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_THRESHOLD=0
TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_ALPHA=0
PREEMPTION_CHECK_PERIOD=1
PREEMPTION_CPU_CAPACITY=2
MAX_LOG_LEN=0
DISABLE_LOGGING_REQUEST=false
TOKENIZER_NAME=mistralai/Voxtral-Mini-3B-2507
GPU_MEMORY_UTILIZATION=0.95
BLOCK_SIZE=16
SWAP_SPACE=4
ENFORCE_EAGER=false
MAX_SEQ_LEN_TO_CAPTURE=8192
DISABLE_CUSTOM_ALL_REDUCE=false
DEFAULT_BATCH_SIZE=50
DEFAULT_MIN_BATCH_SIZE=1
DEFAULT_BATCH_SIZE_GROWTH_FACTOR=3
RAW_OPENAI_OUTPUT=true
OPENAI_RESPONSE_ROLE=assistant
MAX_CONCURRENCY=300
BASE_PATH=/runpod-volume
DISABLE_LOG_REQUESTS=true
ENABLE_AUTO_TOOL_CHOICE=false
HUGGING_FACE_HUB_TOKEN="{{ RUNPOD_SECRET_HF_TOKEN }}"
VLLM_GPU_MEMORY_UTILIZATION=0.6
Metadata
Metadata
Assignees
Labels
No labels