@@ -343,14 +343,34 @@ Data for the following features and attributes is collected to improve Ray Data
343343If you would like to opt-out from usage data collection, you can follow :ref: `Ray usage stats <ref-usage-stats >`
344344to turn it off.
345345
346- .. _ production_guide :
346+ .. _ faqs :
347347
348- Production guide
348+ Frequently Asked Questions (FAQs)
349349--------------------------------------------------
350350
351+ .. TODO(#55491): Rewrite this section once the restriction is lifted.
352+ .. _cross_node_parallelism :
353+
354+ How to configure LLM stage to parallelize across multiple nodes?
355+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
356+
357+ At the moment, Ray Data LLM doesn't support cross-node parallelism (either
358+ tensor parallelism or pipeline parallelism).
359+
360+ The processing pipeline is designed to run on a single node. The number of
361+ GPUs is calculated as the product of the tensor parallel size and the pipeline
362+ parallel size, and apply
363+ [`STRICT_PACK ` strategy](https://docs.ray.io/en/latest/ray-core/scheduling/placement-group.html#pgroup-strategy)
364+ to ensure that each replica of the LLM stage is executed on a single node.
365+
366+ Nevertheless, you can still horizontally scale the LLM stage to multiple nodes
367+ as long as each replica (TP * PP) fits into a single node. The number of
368+ replicas is configured by the `concurrency ` argument in
369+ :class: `vLLMEngineProcessorConfig <ray.data.llm.vLLMEngineProcessorConfig> `.
370+
351371.. _model_cache :
352372
353- Caching model weight to remote object storage
373+ How to cache model weight to remote object storage
354374~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
355375
356376While deploying Ray Data LLM to large scale clusters, model loading may be rate
0 commit comments