This repository demonstrates a CoreWeave aligned stack for high performance model serving. It combines Slurm on Kubernetes (SUNK), Tensorizer, vLLM, and CoreWeave observability to provide fast, reproducible, and secure deployments.
# clone and enter the repo
git clone https://github.com/coreweave/tensorizer
cd tensorizer
# run the Tensorizer demo
python examples/tensorizer/serialize_and_load.py --local-only- Deploy SUNK to a Kubernetes cluster and schedule a pod from a Slurm job.
- Serialize and host a model using Tensorizer and CoreWeave Object Storage.
- Launch vLLM pointing at the tensorized weights via the provided Helm chart.
- Observe GPU and network metrics in CoreWeave Grafana dashboards.
The following documents provide more detail: