Skip to content

[recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller#45

Merged
0oshowero0 merged 1 commit intoTransferQueue:main_tqfrom
LLLLxmmm:tq-verl-data_partitions_main_tq
Nov 17, 2025
Merged

[recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller#45
0oshowero0 merged 1 commit intoTransferQueue:main_tqfrom
LLLLxmmm:tq-verl-data_partitions_main_tq

Conversation

@LLLLxmmm
Copy link

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

…artitions for Train/Val/Test in controller
@coderabbitai
Copy link

coderabbitai bot commented Nov 17, 2025

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@0oshowero0 0oshowero0 merged commit b5be2ad into TransferQueue:main_tq Nov 17, 2025
5 checks passed
ji-huazhong added a commit that referenced this pull request Nov 18, 2025
…oller

* Support storage unit in TransferQueue

* Fix importance error

* Support controller in TransferQueue (#2)

* Support controller in TransferQueue

* Fix import

* Fix comments

---------

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* expose TransferQueueClient (#3)

* Add copyright and license information

Added copyright and licensing information to the controller.py file.

* update client docstring (#5)

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* merge TransferQueue utils (#4)

* [fix] Fix n_sample related problems (#8)

* update client docstring

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* fix n_sample related problems

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

---------

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* expose TransferQueue client/controller UT (#6)

* Add metadata.py and test_simple_storage_unit.py (#9)

* Add metadata.py and test_simple_storage_unit.py

* Add copyright and license information to test_simple_storage_unit.py

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Han Zhenyu 韩振宇 <o0shower0o@outlook.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Add reorder function to BatchMeta (#13)

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* [recipe, data] feat: TransferQueue - Support managing multiple data partitions for Train/Val/Test in controller (#45)

Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>

* delete TQ source codes

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* update docs

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* update performance

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

* fix

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>

---------

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Co-authored-by: FightingZhen <295632982@qq.com>
Co-authored-by: Han Zhenyu 韩振宇 <o0shower0o@outlook.com>
Co-authored-by: LLLLxmmm <130739718+LLLLxmmm@users.noreply.github.com>
Co-authored-by: liuximeng <13073314+liuximeng18772102439@user.noreply.gitee.com>
Co-authored-by: Han Zhenyu 韩振宇 <hanzy19@tsinghua.org.cn>
Co-authored-by: zhabuye <74179177+zhabuye@users.noreply.github.com>
Co-authored-by: Jianjun Zhong <87791082+jianjunzhong@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants