vLLM-in-PyTorch-Conference-2025

There are at least 53 videos (out of the full 117 videos in PyTorch conference 2025) mentioning vLLM. The ratio is 45%, or around 50%. The list is non-exhaustive. If you find any other videos mentioning vLLM in PyTorch conference 2025, welcome to submit a pull-request to add it in the list!

Keynote: Welcome & Opening Remarks Matt White, Executive Director, PyTorch Foundation

Details: 001-keynote-welcome-opening-remarks-matt-white-executive-directo

Keynote: vLLM & Deepspeed Updates Simon Mo & Tunji Ruwase

Details: 003-keynote-vllm-deepspeed-updates-simon-mo-tunji-ruwase

Keynote: Ray: A Distributed Compute Engine for AI Robert Nishihara & Ion Stoica

Details: 004-keynote-ray-a-distributed-compute-engine-for-ai-robert-nishi

Keynote: Olmo-Thinking: Training a Fully Open Reasoning Model Nathan Lambert

Details: 007-keynote-olmo-thinking-training-a-fully-open-reasoning-model

Open Source Model Performance Optimization With SGLang Yineng Zhang, Together AI

Details: 011-open-source-model-performance-optimization-with-sglang-yinen

PyTorch Symmetric Memory: A New Programming Paradigm for Distributed AI Ke Wen & Chien-Chin Huang

Details: 012-pytorch-symmetric-memory-a-new-programming-paradigm-for-dist

Sponsored Session: Lightning Talk: Accelerating Experimentation and Unlocking Real-Time Inference on Microcontrollers with Lightning Niall Lyons & Luca Antiga

Details: 014-sponsored-session-lightning-talk-accelerating-experimentatio

Deploying GenAI for Audio Generation on Mobile CPUs With ExecuTorch Gian Marco Iodice, Arm

Details: 016-deploying-genai-for-audio-generation-on-mobile-cpus-with-exe

Sponsored Session: Beyond the Node: Scaling Inference with Cluster-Wide KVCache Management Alon Yariv

Details: 019-sponsored-session-beyond-the-node-scaling-inference-with-clu

Lightning Talk: Hardware-Aware Python Packages ~ PyTorch and WheelNext Grab the Wheel! Jonathan Dekhtiar & Eli Uriegas

Details: 020-lightning-talk-hardware-aware-python-packages-pytorch-and-wh

Our Journey With TorchTitan Linsong Chu & Garrett Goon, IBM Research

Details: 026-our-journey-with-torchtitan-linsong-chu-garrett-goon-ibm-res

The Building Blocks of Agentic Al Joe Spisak, Product Director, Meta Superintelligence Labs

Details: 028-the-building-blocks-of-agentic-al-joe-spisak-product-directo

Sponsor Keynote: Build AI Anywhere with ROCm™ Software: AMD and PyTorch Bring Cloud-to-Client Power to Developers Anush Elangovan

Details: 030-sponsor-keynote-build-ai-anywhere-with-rocm-software-amd-and

Keynote: PyTorch Technical Deep Dive Alban Desmaison, Peng Wu, Mark Saroufim & Edward Yang, Meta

Details: 031-keynote-pytorch-technical-deep-dive-alban-desmaison-peng-wu

Sponsored Session: Lightning Talk: Build and Deploy AI Flows with an Agent Factory Arjun Upadhyay

Details: 034-sponsored-session-lightning-talk-build-and-deploy-ai-flows-w

Sponsored Session: Everything Everywhere all at Once: vLLM Hardware Optionality with Spotify and Google Brittany Rockwell & Shireen Kheradpey

Details: 035-sponsored-session-everything-everywhere-all-at-once-vllm-har

An Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + vLLM Robert Nishihara, Anyscale

Details: 037-an-open-source-post-training-stack-kubernetes-ray-pytorch-vl

PyTorch-Native Stack for Agents Allen Wang & Davide Testuggine, Meta

Details: 038-pytorch-native-stack-for-agents-allen-wang-davide-testuggine

Sponsored Session: Lightning Talk: Optimizing Model Inference with PyTorch 2.0 Devansh Ghatak

Details: 040-sponsored-session-lightning-talk-optimizing-model-inference

Thunder: Distribute and Optimize Your PyTorch Models With Zero Code Changes Luca Antiga & Thomas Viehmann

Details: 042-thunder-distribute-and-optimize-your-pytorch-models-with-zer

vLLM: Easy, Fast, and Cheap LLM Serving for Everyone Simon Mo, vLLM

Details: 044-vllm-easy-fast-and-cheap-llm-serving-for-everyone-simon-mo-v

Sponsored Session: Amazingly Fast and Incredibly Scalable Inference with NVIDIA's Dynamo and TensorRT-LLM Harry Kim & Laikh Tewari

Details: 045-sponsored-session-amazingly-fast-and-incredibly-scalable-inf

Lightning Talk: Unlock the Future of Generative AI: TorchTitan's Latest Breakthroughs Tianyu Liu & Jiani Wang

Details: 047-lightning-talk-unlock-the-future-of-generative-ai-torchtitan

Breaking Heterogeneity Barriers: Unified Cloud-to-Robot AI System SW Stack for Embodied Intelligence Yonghua Lin

Details: 051-breaking-heterogeneity-barriers-unified-cloud-to-robot-ai-sy

Lightning Talk: Flex Attention for Inference Boyuan Feng & Driss Guessous, Meta

Details: 054-lightning-talk-flex-attention-for-inference-boyuan-feng-dris

Sponsored Session: Lightning Talk: Improving Drug Discovery with Machine Learning and Molecular Dynamics Simon Axelrod

Details: 055-sponsored-session-lightning-talk-improving-drug-discovery-wi

Sponsored Session: Lightning Talk: Accelerated Software for a Post-Moore World Jay Dawani

Details: 058-sponsored-session-lightning-talk-accelerated-software-for-a

Sponsored Session: Building the Next Generation of Open Source AI Tooling Travis Oliphant

Details: 059-sponsored-session-building-the-next-generation-of-open-sourc

No GPU Left Behind: Scaling Online LLM Training With Co-located VLLM in TRL Mert Toslali & Yu Chin Fabian Lim

Details: 060-no-gpu-left-behind-scaling-online-llm-training-with-co-locat

Lightning Talk: Challenges and Standardization in PyTorch Ecosystem Accelerators Zesheng Zong & Ashok Emani

Details: 062-lightning-talk-challenges-and-standardization-in-pytorch-eco

Keynote: Building the Open Agent Ecosystem Together: Introducing OpenEnvJoe Spisak, Lysandre Debut

Details: 064-keynote-building-the-open-agent-ecosystem-together-introduci

Enabling VLLM V1 on AMD GPUs With Triton Thomas Parnell, IBM Research & Aleksandr Malyshev, AMD

Details: 069-enabling-vllm-v1-on-amd-gpus-with-triton-thomas-parnell-ibm

Transformers: Standardizing Model Definitions Across the PyTorch Ecosystem L. Debut & A. Zucker

Details: 070-transformers-standardizing-model-definitions-across-the-pyto

How Modern PyTorch Supercharges Multimodal Training and Inference at Luma AI Thomas Neff, Luma AI

Details: 073-how-modern-pytorch-supercharges-multimodal-training-and-infe

Optimizing Long-Tail and MoE Challenges in Reinforcement Learning with SGLang Chenyang Zhao, UCLA

Details: 074-optimizing-long-tail-and-moe-challenges-in-reinforcement-lea

Serving PyTorch LLMs at Scale: Disaggregated Inference With Kubernetes and Llm-d M. Ayoub & C. Liu

Details: 078-serving-pytorch-llms-at-scale-disaggregated-inference-with-k

Verl: A Flexible and Efficient RL Framework for LLMs Hongpeng Guo & Ziheng Jiang, ByteDance Seed

Details: 079-verl-a-flexible-and-efficient-rl-framework-for-llms-hongpeng

Sponsored Session: Lightning Talk: PyTorch and Democratization of AI Accelerators Hong-Seok Kim

Details: 080-sponsored-session-lightning-talk-pytorch-and-democratization

Multi-Accelerator PyTorch Serving With NxD Inference and vLLM Yahav Biran & Liangfu Chen, Amazon

Details: 081-multi-accelerator-pytorch-serving-with-nxd-inference-and-vll

Scaling KV Caches for LLMs: How LMCache + NIXL Handle Network and Storage Heterogeneity J. Jiang & M. Khazraee

Details: 083-scaling-kv-caches-for-llms-how-lmcache-nixl-handle-network-a

Sponsored Session: Lightning Talk: PyTorch in Production: Boosting LLM Training and Inferencing on Ascend NPU F. Hua

Details: 085-sponsored-session-lightning-talk-pytorch-in-production-boost

Keynote Panel: Hardwares & Accelerators D Patel, S Zhou, P Salanki, N Perumbeti, M Saroufim

Details: 088-keynote-panel-hardwares-accelerators-d-patel-s-zhou-p-salank

Blazing Fast GenAI Inference With Torch.compile Richard Zou, Meta

Details: 093-blazing-fast-genai-inference-with-torch-compile-richard-zou

Sponsored Session: Empowering AI Everywhere: Democratizing PyTorch with Intel AIPCs & More F. Zhao & E. Wang

Details: 094-sponsored-session-empowering-ai-everywhere-democratizing-pyt

Designing and Building Custom Reinforcement Learning Environments for Fine-tuning LLMs N. Bantilan

Details: 095-designing-and-building-custom-reinforcement-learning-environ

Sponsored Session: Accelerating GenAI Inference: From AWS Deep Learning Containers to Scaling Amazon Rufus on Trainium P. Nguyen & A. Zhao

Details: 096-sponsored-session-accelerating-genai-inference-from-aws-deep

Lightning Talk: Vllm-triton-backend: How To Get State-of-the-art Performance on NVIDIA and AMD With Just Triton B. Ringlein

Details: 098-lightning-talk-vllm-triton-backend-how-to-get-state-of-the-a

Sponsored Session: Lightning Talk: Efficient Inference Serving with Kubernetes Gateway API Inference Extension Lin Sun

Details: 099-sponsored-session-lightning-talk-efficient-inference-serving

Lightning Talk: Improved GEMM and SDPA Performance on ROCm With Composable Kernel Andres Lugo, AMD

Details: 104-lightning-talk-improved-gemm-and-sdpa-performance-on-rocm-wi

Kimi K2 and Our Contributions to Open Source Yuxin Wu, Moonshot AI

Details: 110-kimi-k2-and-our-contributions-to-open-source-yuxin-wu-moonsh

Lightning Talk: A Stable Limited LibTorch ABI? How?! (and Why?) Jane Xu, Meta

Details: 112-lightning-talk-a-stable-limited-libtorch-abi-how-and-why-jan

Sponsored Session: Arctic Inference: Breaking the Speed-Cost Tradeoff in LLM Serving Aurick Qiao

Details: 114-sponsored-session-arctic-inference-breaking-the-speed-cost-t

PyTorch Conference 2025 Recap

Details: 117-pytorch-conference-2025-recap

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
items		items
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM-in-PyTorch-Conference-2025

About

Uh oh!

Releases

Packages

Contributors 2

vllm-project/vLLM-in-PyTorch-Conference-2025

Folders and files

Latest commit

History

Repository files navigation

vLLM-in-PyTorch-Conference-2025

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages