You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/infrastructure/execution-backends.md
+31-8Lines changed: 31 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,11 +12,27 @@ modality: "universal"
12
12
13
13
# Pipeline Execution Backends
14
14
15
-
Executors run NeMo Curator `Pipeline` workflows across your compute resources. This reference explains the available backends and how to configure them. It applies to all modalities (text, image, video, and audio).
15
+
Configure and optimize execution backends to run NeMo Curator pipelines efficiently across single machines, multi-GPU systems, and distributed clusters.
16
16
17
-
## How it Works
17
+
## Overview
18
18
19
-
Build your pipeline by adding stages, then run it with an executor:
19
+
Execution backends (executors) are the engines that run NeMo Curator `Pipeline` workflows across your compute resources. They handle:
20
+
21
+
-**Task Distribution**: Distribute pipeline stages across available workers and GPUs
22
+
-**Resource Management**: Allocate CPU, GPU, and memory resources to processing tasks
23
+
-**Scaling**: Automatically or manually scale processing based on workload
24
+
-**Data Movement**: Optimize data transfer between pipeline stages
25
+
26
+
**Choosing the right executor** impacts:
27
+
- Pipeline performance and throughput
28
+
- Resource utilization efficiency
29
+
- Ease of deployment and monitoring
30
+
31
+
This guide covers all execution backends available in NeMo Curator and applies to all modalities: text, image, video, and audio curation.
32
+
33
+
## Basic Usage Pattern
34
+
35
+
All pipelines follow this standard execution pattern:
20
36
21
37
```python
22
38
from nemo_curator.pipeline import Pipeline
@@ -28,6 +44,11 @@ pipeline.add_stage(...)
28
44
results = pipeline.run(executor)
29
45
```
30
46
47
+
**Key points:**
48
+
- The same pipeline definition works with any executor
49
+
- Executor choice is independent of pipeline stages
For more details, refer to the official [NVIDIA Cosmos-Xenna project](https://github.com/nvidia-cosmos/cosmos-xenna/tree/main).
89
110
90
-
### `RayDataExecutor` (experimental)
111
+
### `RayActorPoolExecutor`
112
+
113
+
Executor using Ray Actor pools for custom distributed processing patterns such as deduplication.
91
114
92
115
`RayDataExecutor` uses Ray Data, a scalable data processing library built on Ray Core. Ray Data provides a familiar DataFrame-like API for distributed data transformations. This executor is experimental and best suited for large-scale batch processing tasks that benefit from Ray Data's optimized data loading and transformation pipelines.
93
116
@@ -97,9 +120,9 @@ For more details, refer to the official [NVIDIA Cosmos-Xenna project](https://gi
97
120
-**Experimental status**: API and performance characteristics may change
98
121
99
122
```python
100
-
from nemo_curator.backends.experimental.ray_dataimportRayDataExecutor
123
+
from nemo_curator.backends.experimental.ray_actor_poolimportRayActorPoolExecutor
0 commit comments