There are at least 53 videos (out of the full 117 videos in PyTorch conference 2025) mentioning vLLM. The ratio is 45%, or around 50%. The list is non-exhaustive. If you find any other videos mentioning vLLM in PyTorch conference 2025, welcome to submit a pull-request to add it in the list!
- Keynote: Welcome & Opening Remarks Matt White, Executive Director, PyTorch Foundation
- Keynote: vLLM & Deepspeed Updates Simon Mo & Tunji Ruwase
- Keynote: Ray: A Distributed Compute Engine for AI Robert Nishihara & Ion Stoica
- Keynote: Olmo-Thinking: Training a Fully Open Reasoning Model Nathan Lambert
- Open Source Model Performance Optimization With SGLang Yineng Zhang, Together AI
- PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI Ke Wen & Chien-Chin Huang
- Sponsored Session: Lightning Talk: Accelerating Experimentation and Unlocking Real-Time Inference on Microcontrollers with Lightning Niall Lyons & Luca Antiga
- Deploying GenAI for Audio Generation on Mobile CPUs With ExecuTorch Gian Marco Iodice, Arm
- Sponsored Session: Beyond the Node: Scaling Inference with Cluster-Wide KVCache Management Alon Yariv
- Lightning Talk: Hardware-Aware Python Packages ~ PyTorch and WheelNext Grab the Wheel! Jonathan Dekhtiar & Eli Uriegas
- Our Journey With TorchTitan Linsong Chu & Garrett Goon, IBM Research
- The Building Blocks of Agentic Al Joe Spisak, Product Director, Meta Superintelligence Labs
- Sponsor Keynote: Build AI Anywhere with ROCm™ Software: AMD and PyTorch Bring Cloud-to-Client Power to Developers Anush Elangovan
- Keynote: PyTorch Technical Deep Dive Alban Desmaison, Peng Wu, Mark Saroufim & Edward Yang, Meta
- Sponsored Session: Lightning Talk: Build and Deploy AI Flows with an Agent Factory Arjun Upadhyay
- Sponsored Session: Everything Everywhere all at Once: vLLM Hardware Optionality with Spotify and Google Brittany Rockwell & Shireen Kheradpey
- An Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + vLLM Robert Nishihara, Anyscale
- PyTorch-Native Stack for Agents Allen Wang & Davide Testuggine, Meta
- Sponsored Session: Lightning Talk: Optimizing Model Inference with PyTorch 2.0 Devansh Ghatak
- Thunder: Distribute and Optimize Your PyTorch Models With Zero Code Changes Luca Antiga & Thomas Viehmann
- vLLM: Easy, Fast, and Cheap LLM Serving for Everyone Simon Mo, vLLM
- Sponsored Session: Amazingly Fast and Incredibly Scalable Inference with NVIDIA's Dynamo and TensorRT-LLM Harry Kim & Laikh Tewari
- Lightning Talk: Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs Tianyu Liu & Jiani Wang
- Breaking Heterogeneity Barriers: Unified Cloud-to-Robot AI System SW Stack for Embodied Intelligence Yonghua Lin
- Lightning Talk: Flex Attention for Inference Boyuan Feng & Driss Guessous, Meta
- Sponsored Session: Lightning Talk: Improving Drug Discovery with Machine Learning and Molecular Dynamics Simon Axelrod
- Sponsored Session: Lightning Talk: Accelerated Software for a Post-Moore World Jay Dawani
- Sponsored Session: Building the Next Generation of Open Source AI Tooling Travis Oliphant
- No GPU Left Behind: Scaling Online LLM Training With Co-located VLLM in TRL Mert Toslali & Yu Chin Fabian Lim
- Lightning Talk: Challenges and Standardization in PyTorch Ecosystem Accelerators Zesheng Zong & Ashok Emani
- Keynote: Building the Open Agent Ecosystem Together: Introducing OpenEnvJoe Spisak, Lysandre Debut
- Enabling VLLM V1 on AMD GPUs With Triton Thomas Parnell, IBM Research & Aleksandr Malyshev, AMD
- Transformers: Standardizing Model Definitions Across the PyTorch Ecosystem L. Debut & A. Zucker
- How Modern PyTorch Supercharges Multimodal Training and Inference at Luma AI Thomas Neff, Luma AI
- Optimizing Long-Tail and MoE Challenges in Reinforcement Learning with SGLang Chenyang Zhao, UCLA
- Serving PyTorch LLMs at Scale: Disaggregated Inference With Kubernetes and Llm-d M. Ayoub & C. Liu
- Verl: A Flexible and Efficient RL Framework for LLMs Hongpeng Guo & Ziheng Jiang, ByteDance Seed
- Sponsored Session: Lightning Talk: PyTorch and Democratization of AI Accelerators Hong-Seok Kim
- Multi-Accelerator PyTorch Serving With NxD Inference and vLLM Yahav Biran & Liangfu Chen, Amazon
- Scaling KV Caches for LLMs: How LMCache + NIXL Handle Network and Storage Heterogeneity J. Jiang & M. Khazraee
- Sponsored Session: Lightning Talk: PyTorch in Production: Boosting LLM Training and Inferencing on Ascend NPU F. Hua
- Keynote Panel: Hardwares & Accelerators D Patel, S Zhou, P Salanki, N Perumbeti, M Saroufim
- Blazing Fast GenAI Inference With Torch.compile Richard Zou, Meta
- Sponsored Session: Empowering AI Everywhere: Democratizing PyTorch with Intel AIPCs & More F. Zhao & E. Wang
- Designing and Building Custom Reinforcement Learning Environments for Fine-tuning LLMs N. Bantilan
- Sponsored Session: Accelerating GenAI Inference: From AWS Deep Learning Containers to Scaling Amazon Rufus on Trainium P. Nguyen & A. Zhao
- Lightning Talk: Vllm-triton-backend: How To Get State-of-the-art Performance on NVIDIA and AMD With Just Triton B. Ringlein
- Sponsored Session: Lightning Talk: Efficient Inference Serving with Kubernetes Gateway API Inference Extension Lin Sun
- Lightning Talk: Improved GEMM and SDPA Performance on ROCm With Composable Kernel Andres Lugo, AMD
- Kimi K2 and Our Contributions to Open Source Yuxin Wu, Moonshot AI
- Lightning Talk: A Stable Limited LibTorch ABI? How?! (and Why?) Jane Xu, Meta
- Sponsored Session: Arctic Inference: Breaking the Speed-Cost Tradeoff in LLM Serving Aurick Qiao
- PyTorch Conference 2025 Recap
- Details: 117-pytorch-conference-2025-recap