Skip to content

Commit d413461

Browse files
CopilotVibhuJawa
andauthored
Fix image curation tutorial batch sizes for typical GPUs (#1050)
* Initial plan * Fix image tutorial batch sizes for typical GPUs - reduce from 500 to 32 Co-authored-by: VibhuJawa <[email protected]> * Move batch size note up and enhance with specific GPU guidance Co-authored-by: VibhuJawa <[email protected]> * Update tutorials/image/getting-started/README.md Signed-off-by: Vibhu Jawa <[email protected]> --------- Signed-off-by: Vibhu Jawa <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: VibhuJawa <[email protected]> Co-authored-by: Vibhu Jawa <[email protected]>
1 parent 12e84d7 commit d413461

File tree

1 file changed

+10
-6
lines changed

1 file changed

+10
-6
lines changed

tutorials/image/getting-started/README.md

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,10 @@ python -c "from huggingface_hub import snapshot_download; snapshot_download('ttj
4747

4848
### Run the scripts
4949

50+
**Note on Batch Sizes:** The batch sizes used in both workflows below are conservative limits set for typical GPUs with 24-48 GB of VRAM (e.g., RTX 4090, A6000, RTX A5000). You can tune these based on your available GPU memory:
51+
- **High-memory GPUs (80 GB+)** like H100, B200, or A100 80GB: Increase batch sizes for better performance (e.g., `--task-batch-size 500 --embedding-batch-size 500 --aesthetic-batch-size 500 --nsfw-batch-size 500`)
52+
- **Lower-memory GPUs (16 GB or less)**: Reduce batch sizes further (e.g., `--task-batch-size 16 --embedding-batch-size 16`)
53+
5054
Run the image curation pipeline on GPUs (extracting embeddings, NSFW and aesthetics scores, filtering based on thresholds):
5155

5256
```bash
@@ -56,10 +60,10 @@ python tutorials/image/getting-started/image_curation_example.py \
5660
--output-dataset-dir ./example_data/results_truncated_100k_mscoco \
5761
--model-dir ./model_weights \
5862
--tar-files-per-partition 10 \
59-
--task-batch-size 500 \
60-
--embedding-batch-size 500 \
61-
--aesthetic-batch-size 500 \
62-
--nsfw-batch-size 500 \
63+
--task-batch-size 32 \
64+
--embedding-batch-size 32 \
65+
--aesthetic-batch-size 32 \
66+
--nsfw-batch-size 32 \
6367
--aesthetic-threshold 0.9 \
6468
--nsfw-threshold 0.9 \
6569
--images-per-tar 1000 \
@@ -75,8 +79,8 @@ python tutorials/image/getting-started/image_dedup_example.py \
7579
--embeddings-dir ./example_data/dedup/embeddings/truncated_100k_mscoco \
7680
--removal-parquets-dir ./example_data/dedup/removal_ids/truncated_100k_mscoco \
7781
--model-dir ./model_weights \
78-
--task-batch-size 1000 \
79-
--embedding-batch-size 500 \
82+
--task-batch-size 32 \
83+
--embedding-batch-size 32 \
8084
--tar-files-per-partition 10 \
8185
--skip-download \
8286
--verbose

0 commit comments

Comments
 (0)