vLLM can load tensorized weights without conversion.
bash examples/vllm/run_vllm_tensorized.sh s3://my-bucket/models/tiny-gpt2.tensorsThe script launches a server and performs a smoke test query.
vllm serve --tensorizerreads weights from disk, HTTP, or S3.- Environment variables like
VLLM_WORKER_GPU_MEMORY_UTILIZATIONtune throughput vs. memory usage. - Prometheus metrics at
/metricsexpose time‑to‑first‑token and tokens/sec. - Scale out with KServe or plain Deployments using the Helm chart in
helm/tensorizer-vllm.
Refer to the vLLM documentation for advanced options.