Skip to content

Conversation

@ZiyiTsang
Copy link
Collaborator

@ZiyiTsang ZiyiTsang commented Oct 18, 2025

This pull request refactors the codebase to use Python's built-in generic types (e.g., list, dict) instead of those from the typing module (e.g., List, Dict) across many files. It also introduces a more flexible batch filtering mechanism for dynamic sampling in PPO training, replacing the previous hardcoded approach. Additionally, a new utility for truncating batched dictionaries is added. These changes improve code readability, maintainability, and type consistency.

Type annotation modernization

  • Replaced all instances of List and Dict from the typing module with built-in list and dict type annotations throughout areal/api/cli_args.py, areal/api/workflow_api.py, and related files, resulting in cleaner and more modern Python code. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

Dynamic sampling and batch filtering improvements

  • Replaced the dynamic_sampling function in areal/utils/functional.py with a more flexible filter_batch mechanism and a specific implementation filter_batch_fn_DAPO, allowing for easier extension and customization of batch filtering strategies.
  • Updated the PPO actor logic in areal/engine/ppo/actor.py to remove the direct use of dynamic_sampling, reflecting the new filtering approach. [1] [2] [3]
  • Changed the PPO configuration option from a boolean dynamic_sampling to a string-based dynamic_sampling_strategy for greater flexibility in selecting sampling strategies.

Utility additions

  • Added truncate_dict_to_batch_size in areal/utils/data.py to support truncating batched dictionaries to a specified batch size, improving data handling robustness.

Minor API and formatting updates

  • Updated type hints for several functions to use new style union (|) and built-in generics, and made minor formatting improvements for readability and consistency. [1] [2] [3] [4] [5] [6]

Parser and assertion formatting

  • Minor formatting changes in parser and assertion logic for improved readability in areal/api/cli_args.py. [1] [2]

These changes collectively modernize the codebase and make it easier to maintain and extend.

ZiyiTsang and others added 16 commits October 14, 2025 00:06
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ZiyiTsang
Copy link
Collaborator Author

ZiyiTsang commented Oct 18, 2025

Quick note:

  • moved the logic of DAPO to the main dapo.py, as disscussed
  • support 3 types of dynamic sampling strategy
    • none: Turn off dynamic sampling.
    • static: Only one rollout turn and apply the filter function on it, this may result in variable batch size.
    • dynamic: Enable Multi-turn rollout to keep the constant batch size.
  • The should_accept is not used in this PR, since may cause Inconsistent behavior between synchronous and asynchronous

Boundary impact

  • Because the logic of DAPO has been elevated to within the main script, the dynamic sampling can only be used with gsm8k_dapo.py.
  • The dynamic sampling should be refactored in prepare_batch and fixed in future PRs. (But MUST after refactor rollout code),
  • I will open a new PR soon to disucss this

@ZiyiTsang

This comment was marked as resolved.

gemini-code-assist[bot]

This comment was marked as resolved.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ZiyiTsang
Copy link
Collaborator Author

/gemini review

@ZiyiTsang ZiyiTsang marked this pull request as ready for review October 24, 2025 07:31
gemini-code-assist[bot]

This comment was marked as resolved.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions bot added stale and removed stale labels Nov 10, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity within the last 14 days.

Please add a comment or push new commits to keep it active.

Thank you for your contribution!

@github-actions github-actions bot added the stale label Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant