Skip to content

Commit 0821304

Browse files
arhamm1lbliii
andauthored
Update execution-backends.md (#1263)
Signed-off-by: Arham Mehta <[email protected]> Signed-off-by: L.B. <[email protected]> Co-authored-by: L.B. <[email protected]>
1 parent 76765fc commit 0821304

File tree

1 file changed

+31
-8
lines changed

1 file changed

+31
-8
lines changed

docs/reference/infrastructure/execution-backends.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,27 @@ modality: "universal"
1212

1313
# Pipeline Execution Backends
1414

15-
Executors run NeMo Curator `Pipeline` workflows across your compute resources. This reference explains the available backends and how to configure them. It applies to all modalities (text, image, video, and audio).
15+
Configure and optimize execution backends to run NeMo Curator pipelines efficiently across single machines, multi-GPU systems, and distributed clusters.
1616

17-
## How it Works
17+
## Overview
1818

19-
Build your pipeline by adding stages, then run it with an executor:
19+
Execution backends (executors) are the engines that run NeMo Curator `Pipeline` workflows across your compute resources. They handle:
20+
21+
- **Task Distribution**: Distribute pipeline stages across available workers and GPUs
22+
- **Resource Management**: Allocate CPU, GPU, and memory resources to processing tasks
23+
- **Scaling**: Automatically or manually scale processing based on workload
24+
- **Data Movement**: Optimize data transfer between pipeline stages
25+
26+
**Choosing the right executor** impacts:
27+
- Pipeline performance and throughput
28+
- Resource utilization efficiency
29+
- Ease of deployment and monitoring
30+
31+
This guide covers all execution backends available in NeMo Curator and applies to all modalities: text, image, video, and audio curation.
32+
33+
## Basic Usage Pattern
34+
35+
All pipelines follow this standard execution pattern:
2036

2137
```python
2238
from nemo_curator.pipeline import Pipeline
@@ -28,6 +44,11 @@ pipeline.add_stage(...)
2844
results = pipeline.run(executor)
2945
```
3046

47+
**Key points:**
48+
- The same pipeline definition works with any executor
49+
- Executor choice is independent of pipeline stages
50+
- Switch executors without changing pipeline code
51+
3152
## Available Backends
3253

3354
### `XennaExecutor` (recommended)
@@ -87,7 +108,9 @@ results = pipeline.run(executor)
87108

88109
For more details, refer to the official [NVIDIA Cosmos-Xenna project](https://github.com/nvidia-cosmos/cosmos-xenna/tree/main).
89110

90-
### `RayDataExecutor` (experimental)
111+
### `RayActorPoolExecutor`
112+
113+
Executor using Ray Actor pools for custom distributed processing patterns such as deduplication.
91114

92115
`RayDataExecutor` uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is experimental and best suited for large-scale batch processing tasks that benefit from Ray Data's optimized data loading and transformation pipelines.
93116

@@ -97,9 +120,9 @@ For more details, refer to the official [NVIDIA Cosmos-Xenna project](https://gi
97120
- **Experimental status**: API and performance characteristics may change
98121

99122
```python
100-
from nemo_curator.backends.experimental.ray_data import RayDataExecutor
123+
from nemo_curator.backends.experimental.ray_actor_pool import RayActorPoolExecutor
101124

102-
executor = RayDataExecutor()
125+
executor = RayActorPoolExecutor()
103126
results = pipeline.run(executor)
104127
```
105128

@@ -109,9 +132,9 @@ results = pipeline.run(executor)
109132
### `RayActorPoolExecutor` (experimental)
110133

111134
```python
112-
from nemo_curator.backends.experimental.ray_actor_pool import RayActorPoolExecutor
135+
from nemo_curator.backends.experimental.ray_data import RayDataExecutor
113136

114-
executor = RayActorPoolExecutor()
137+
executor = RayDataExecutor()
115138
results = pipeline.run(executor)
116139
```
117140

0 commit comments

Comments
 (0)