-
Notifications
You must be signed in to change notification settings - Fork 240
[FEAT] Enhance DAPO for full dynamic sampling #465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
Quick note:
Boundary impact
|
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
/gemini review |
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days. Please add a comment or push new commits to keep it active. Thank you for your contribution! |
|
This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days. Please add a comment or push new commits to keep it active. Thank you for your contribution! |
This pull request refactors the codebase to use Python's built-in generic types (e.g.,
list,dict) instead of those from thetypingmodule (e.g.,List,Dict) across many files. It also introduces a more flexible batch filtering mechanism for dynamic sampling in PPO training, replacing the previous hardcoded approach. Additionally, a new utility for truncating batched dictionaries is added. These changes improve code readability, maintainability, and type consistency.Type annotation modernization
ListandDictfrom thetypingmodule with built-inlistanddicttype annotations throughoutareal/api/cli_args.py,areal/api/workflow_api.py, and related files, resulting in cleaner and more modern Python code. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]Dynamic sampling and batch filtering improvements
dynamic_samplingfunction inareal/utils/functional.pywith a more flexiblefilter_batchmechanism and a specific implementationfilter_batch_fn_DAPO, allowing for easier extension and customization of batch filtering strategies.areal/engine/ppo/actor.pyto remove the direct use ofdynamic_sampling, reflecting the new filtering approach. [1] [2] [3]dynamic_samplingto a string-baseddynamic_sampling_strategyfor greater flexibility in selecting sampling strategies.Utility additions
truncate_dict_to_batch_sizeinareal/utils/data.pyto support truncating batched dictionaries to a specified batch size, improving data handling robustness.Minor API and formatting updates
|) and built-in generics, and made minor formatting improvements for readability and consistency. [1] [2] [3] [4] [5] [6]Parser and assertion formatting
areal/api/cli_args.py. [1] [2]These changes collectively modernize the codebase and make it easier to maintain and extend.