You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/get-started/text.md
+12-13Lines changed: 12 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,22 +12,22 @@ modality: "text-only"
12
12
13
13
# Get Started with Text Curation
14
14
15
-
This guide helps you set up and get started with NeMo Curator's text curation capabilities. Follow these steps to prepare your environment and run your first text curation pipeline.
15
+
This guide provides step-by-step instructions for setting up NeMo Curator’s text curation capabilities. Follow these instructions to prepare your environment and execute your first text curation pipeline.
16
16
17
17
## Prerequisites
18
18
19
-
To use NeMo Curator's text curation modules, ensure you meet the following requirements:
19
+
To use NeMo Curator’s text curation modules, ensure your system meets the following requirements:
20
20
21
21
* Python 3.10, 3.11, or 3.12
22
22
* packaging >= 22.0
23
23
* uv (for package management and installation)
24
24
* Ubuntu 22.04/20.04
25
25
* NVIDIA GPU (optional for most text modules, required for GPU-accelerated operations)
26
26
* Volta™ or higher (compute capability 7.0+)
27
-
* CUDA 12 (or above)
27
+
* CUDA 12 (or later)
28
28
29
29
:::{tip}
30
-
If you don't have `uv` installed, refer to the [Installation Guide](../admin/installation.md) for setup instructions, or install it quickly with:
30
+
If `uv`is not installed, refer to the [Installation Guide](../admin/installation.md) for setup instructions, or install it quickly using:
31
31
32
32
```bash
33
33
curl -LsSf https://astral.sh/uv/0.8.22/install.sh | sh
@@ -40,7 +40,7 @@ source $HOME/.local/bin/env
40
40
41
41
## Installation Options
42
42
43
-
You can install NeMo Curator in three ways:
43
+
You can install NeMo Curator using one of the following methods:
44
44
45
45
::::{tab-set}
46
46
@@ -100,7 +100,7 @@ NeMo Curator uses a pipeline-based architecture for processing text data. Before
100
100
101
101
## Set Up Data Directory
102
102
103
-
Create a directory structure for your text datasets:
103
+
Create the following directories for your text datasets:
104
104
105
105
```bash
106
106
mkdir -p ~/nemo_curator/data/sample
@@ -133,34 +133,33 @@ pipeline.add_stage(
133
133
JsonlReader(
134
134
file_paths="~/nemo_curator/data/sample/",
135
135
files_per_partition=4,
136
-
fields=["text", "id"]# Only read required columns for efficiency
0 commit comments