Add multi-GPU parallelization for dataset generation #10474

ssubhanjali · 2025-10-01T19:36:48Z

Added process_qa_chunk() function to handle parallel processing on individual GPUs
Modified make_dataset() to distribute QA pairs across available GPUs using multiprocessing
Each GPU process creates its own embedding model and backend directory
Added per-GPU progress bars for real-time monitoring

puririshi98 · 2025-10-02T17:07:35Z

please fix precommit and ensure that this incurs no acc change between master and this branch when tested on ralphs data

puririshi98 · 2025-10-02T19:41:22Z

please also update ChangeLog.MD to mention that you have accelerated the retrieval in examples/llm/txt2kg_rag.py using multiproc

* Added process_qa_chunk() function to handle parallel processing on individual GPUs * Modified make_dataset() to distribute QA pairs across available GPUs using multiprocessing * Each GPU process creates its own embedding model and backend directory * Added per-GPU progress bars for real-time monitoring

ssubhanjali requested a review from puririshi98 as a code owner October 1, 2025 19:36

puririshi98 mentioned this pull request Oct 2, 2025

Add multi-GPU parallelization for dataset generation #10471

Closed

ssubhanjali requested review from akihironitta, rusty1s and wsad1 as code owners October 28, 2025 05:16

ssubhanjali force-pushed the ss/multigpu branch 5 times, most recently from d8a9435 to 43ca4f0 Compare October 28, 2025 07:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multi-GPU parallelization for dataset generation #10474

Add multi-GPU parallelization for dataset generation #10474

Uh oh!

ssubhanjali commented Oct 1, 2025

Uh oh!

puririshi98 commented Oct 2, 2025

Uh oh!

puririshi98 commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add multi-GPU parallelization for dataset generation #10474

Are you sure you want to change the base?

Add multi-GPU parallelization for dataset generation #10474

Uh oh!

Conversation

ssubhanjali commented Oct 1, 2025

Uh oh!

puririshi98 commented Oct 2, 2025

Uh oh!

puririshi98 commented Oct 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants