Skip to content

Issue with Voxtral 3B: "Please install vllm[audio] for audio support" #223

@dalager

Description

@dalager

Using the OpenAI /chat/completions endpoint for transcribing audio with Voxtral-Mini-3B fails with missing vllm[audio] package.

I have deployed a vLLM serverless endpoint, with this image: registry.runpod.net/runpod-workers-worker-vllm-main-dockerfile:2becd3534

Error

[info]--- Starting Serverless Worker |  Version 1.7.13 ---\n
[info]Jobs in queue: 1
[info]Jobs in progress: 1
[info]INFO 10-09 08:58:50 [chat_utils.py:473] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.\n
[info]tekken.py           :540  2025-10-09 08:58:51,169 Vocab size: 150000\n
[info]tekken.py           :544  2025-10-09 08:58:51,171 Cutting vocab to first 130072 tokens.\n
[warning]WARNING 10-09 08:58:51 [chat_utils.py:376] 'add_generation_prompt' is not supported for mistral tokenizer, so it will be ignored.\n
[warning]WARNING 10-09 08:58:51 [chat_utils.py:380] 'continue_final_message' is not supported for mistral tokenizer, so it will be ignored.\n
[error]Please install vllm[audio] for audio support
[info]Finished.

My environment variables:

MODEL_NAME=mistralai/Voxtral-Mini-3B-2507
TOKENIZER=auto
TOOL_CALL_PARSER=mistral
CONFIG_FORMAT=mistral
TOKENIZER_MODE=mistral
DISABLE_LORA=true
SKIP_TOKENIZER_INIT=false
TRUST_REMOTE_CODE=true
LOAD_FORMAT=mistral
DTYPE=auto
KV_CACHE_DTYPE=auto
MAX_MODEL_LEN=0
GUIDED_DECODING_BACKEND=outlines
DISTRIBUTED_EXECUTOR_BACKEND=ray
WORKER_USE_RAY=false
RAY_WORKERS_USE_NSIGHT=false
PIPELINE_PARALLEL_SIZE=1
TENSOR_PARALLEL_SIZE=1
MAX_PARALLEL_LOADING_WORKERS=0
ENABLE_PREFIX_CACHING=false
DISABLE_SLIDING_WINDOW=false
USE_V2_BLOCK_MANAGER=false
NUM_LOOKAHEAD_SLOTS=0
SEED=0
NUM_GPU_BLOCKS_OVERRIDE=0
MAX_NUM_BATCHED_TOKENS=0
MAX_NUM_SEQS=256
MAX_LOGPROBS=20
DISABLE_LOG_STATS=false
QUANTIZATION=None
ROPE_THETA=0
TOKENIZER_POOL_SIZE=0
TOKENIZER_POOL_TYPE=ray
ENABLE_LORA=false
MAX_LORAS=1
MAX_LORA_RANK=16
LORA_EXTRA_VOCAB_SIZE=256
LORA_DTYPE=auto
MAX_CPU_LORAS=0
FULLY_SHARDED_LORAS=false
DEVICE=auto
SCHEDULER_DELAY_FACTOR=0
ENABLE_CHUNKED_PREFILL=false
NUM_SPECULATIVE_TOKENS=0
SPECULATIVE_DRAFT_TENSOR_PARALLEL_SIZE=0
SPECULATIVE_MAX_MODEL_LEN=0
SPECULATIVE_DISABLE_BY_BATCH_SIZE=0
NGRAM_PROMPT_LOOKUP_MAX=0
NGRAM_PROMPT_LOOKUP_MIN=0
SPEC_DECODING_ACCEPTANCE_METHOD=rejection_sampler
TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_THRESHOLD=0
TYPICAL_ACCEPTANCE_SAMPLER_POSTERIOR_ALPHA=0
PREEMPTION_CHECK_PERIOD=1
PREEMPTION_CPU_CAPACITY=2
MAX_LOG_LEN=0
DISABLE_LOGGING_REQUEST=false
TOKENIZER_NAME=mistralai/Voxtral-Mini-3B-2507
GPU_MEMORY_UTILIZATION=0.95
BLOCK_SIZE=16
SWAP_SPACE=4
ENFORCE_EAGER=false
MAX_SEQ_LEN_TO_CAPTURE=8192
DISABLE_CUSTOM_ALL_REDUCE=false
DEFAULT_BATCH_SIZE=50
DEFAULT_MIN_BATCH_SIZE=1
DEFAULT_BATCH_SIZE_GROWTH_FACTOR=3
RAW_OPENAI_OUTPUT=true
OPENAI_RESPONSE_ROLE=assistant
MAX_CONCURRENCY=300
BASE_PATH=/runpod-volume
DISABLE_LOG_REQUESTS=true
ENABLE_AUTO_TOOL_CHOICE=false
HUGGING_FACE_HUB_TOKEN="{{ RUNPOD_SECRET_HF_TOKEN }}"
VLLM_GPU_MEMORY_UTILIZATION=0.6

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions