Skip to content

anindya-saha/ray-summit-2025

Repository files navigation

Ray Summit 2025


Title: Scaling Post-Training Workflows with Ray Data, Ray Data LLM, and vLLM

Abstract
Post-training workflows for vision-language models processing large-scale image datasets present coordination challenges among multiple processes and stages that often lead to complex, difficult-to-maintain systems. Inspired from real-world experience, we will talk and demo through a use case of image captioning to demonstrate how Ray Data, Ray Data LLM and vLLM simplify building production-scale post training workflows while achieving high throughput and resource efficiency.

We'll explore practical patterns on:

  • How do we develop a workflow from prototype to production scale gradually
  • Use Ray Data's distributed processing for efficient data loading, batching and transformation
  • Use Ray Data LLM's Processor abstraction for seamless vLLM integration with preprocessing and postprocessing steps
  • Integrate vLLM and scaling up inference on multiple GPUs for high-throughput
  • Fully customize preprocess, postprocess steps; Manage states with classes.
  • Integrate Prometheus & Grafana for real time performance monitoring
  • Optimize GPU resource utilization across distributed workloads

Target Audience
ML engineers building post-training pipelines, teams scaling language model workflows, and practitioners interested in production-ready distributed offine batch inference systems.

Talk Outline (30 minutes)

Problem Context (8 minutes)

  • Post training workflow; Relevant Use Cases
  • Common patterns: preprocess, postprocess, offline batch inferencing
  • Trade-offs between simplicity and scale in existing approaches

Ray Data Approach (15 minutes)

  • Ray Data fundamentals for batch processing
  • Integration patterns with vLLM for language models
  • Code walkthrough: data loading, batching, inference, result handling, monitoring
  • Scaling up resources for each stage independently

Production Considerations (5 minutes)

  • Performance monitoring through Prometheus & Grafana dashboard; Also Ray dashboard
  • Performance characteristics and optimization techniques

Q&A (2 minutes)
Key Technical Points

  • Practical Ray Data usage patterns for post training workloads
  • vLLM integration for efficient batch inference
  • Resource optimization techniques for GPU-intensive pipelines
  • State management without external coordination systems

About

Demo for the Ray Summit 2025

Resources

Stars

Watchers

Forks

Contributors

Languages