[Dataset Performance] Add num workers on dataset processing - labels, tokenization (#1189)

George · dsikka · web-flow · commit 77e4f4c9f9f1 · 2025-02-25T21:40:23.000Z
SUMMARY:
* Add `preprocessing_num_workers` to run dataset processing in parallel
for 2:4 example.

Before:
Tokenizing: 371.12 examples/s, 
Adding labels: 1890.18 examples/s,
Tokenizing: 333.39 examples/s
```bash
Tokenizing: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12802/12802 [00:34&lt;00:00, 371.12 examples/s]
Adding labels: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12802/12802 [00:06&lt;00:00, 1890.18 examples/s]
Tokenizing:   9%|█████████▌                                                                                                     | 22077/256032 [00:59&lt;11:41, 333.39 examples/s
```


After  (num_proc=8):
Tokenizing: 2703.93 examples/s, 
Adding labels: 5524.98 examples/s,
Tokenizing: 2925.98 examples/s
```bash
Tokenizing (num_proc=8): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 12802/12802 [00:04&lt;00:00, 2703.93 examples/s]
Adding labels (num_proc=8): 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 12802/12802 [00:02&lt;00:00, 5524.98 examples/s]
Tokenizing (num_proc=8): 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 256032/256032 [01:27&lt;00:00, 2925.98 examples/s]
```

TEST PLAN:
* Pass existing tests

Co-authored-by: Dipika Sikka &lt;dipikasikka1@gmail.com&gt;
diff --git a/examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py b/examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py
@@ -33,6 +33,7 @@
 bf16 = False  # using full precision for training
 lr_scheduler_type = "cosine"
 warmup_ratio = 0.1
+preprocessing_num_workers = 8
 
 # this will run the recipe stage by stage:
 # oneshot sparsification -> finetuning -> oneshot quantization
@@ -52,6 +53,7 @@
     learning_rate=learning_rate,
     lr_scheduler_type=lr_scheduler_type,
     warmup_ratio=warmup_ratio,
+    preprocessing_num_workers=preprocessing_num_workers,
 )
 logger.info(
     "llmcompressor does not currently support running compressed models in the marlin24 format."  # noqa