Skip to content

Commit 4dd7321

Browse files
authored
[data.llm] Add FAQ to doc, explain STRICT_PACK strategy used in data.llm (ray-project#55505)
Signed-off-by: Linkun <[email protected]>
1 parent 1588700 commit 4dd7321

File tree

1 file changed

+23
-3
lines changed

1 file changed

+23
-3
lines changed

doc/source/data/working-with-llms.rst

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -343,14 +343,34 @@ Data for the following features and attributes is collected to improve Ray Data
343343
If you would like to opt-out from usage data collection, you can follow :ref:`Ray usage stats <ref-usage-stats>`
344344
to turn it off.
345345

346-
.. _production_guide:
346+
.. _faqs:
347347

348-
Production guide
348+
Frequently Asked Questions (FAQs)
349349
--------------------------------------------------
350350

351+
.. TODO(#55491): Rewrite this section once the restriction is lifted.
352+
.. _cross_node_parallelism:
353+
354+
How to configure LLM stage to parallelize across multiple nodes?
355+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
356+
357+
At the moment, Ray Data LLM doesn't support cross-node parallelism (either
358+
tensor parallelism or pipeline parallelism).
359+
360+
The processing pipeline is designed to run on a single node. The number of
361+
GPUs is calculated as the product of the tensor parallel size and the pipeline
362+
parallel size, and apply
363+
[`STRICT_PACK` strategy](https://docs.ray.io/en/latest/ray-core/scheduling/placement-group.html#pgroup-strategy)
364+
to ensure that each replica of the LLM stage is executed on a single node.
365+
366+
Nevertheless, you can still horizontally scale the LLM stage to multiple nodes
367+
as long as each replica (TP * PP) fits into a single node. The number of
368+
replicas is configured by the `concurrency` argument in
369+
:class:`vLLMEngineProcessorConfig <ray.data.llm.vLLMEngineProcessorConfig>`.
370+
351371
.. _model_cache:
352372

353-
Caching model weight to remote object storage
373+
How to cache model weight to remote object storage
354374
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
355375

356376
While deploying Ray Data LLM to large scale clusters, model loading may be rate

0 commit comments

Comments
 (0)