-
Notifications
You must be signed in to change notification settings - Fork 47
Support Experience Pipeline #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
b52809f
* prepare the initial config files for exp pipeline
HYLcool 1430035
+ add basic reward shaping func
HYLcool 061407c
Merge branch 'main' into feat/exp_pipeline
HYLcool d8e9331
Merge branch 'main' into feat/exp_pipeline
HYLcool 78da769
- remove common.schema
HYLcool 04f64aa
* allow async exp pipeline
HYLcool fe0407f
Merge branch 'main' into feat/exp_pipeline
HYLcool 56dd112
+ add more logs
HYLcool 510b2af
+ add buffer check and sync for experience pipeline
HYLcool f1f6ba0
* set several default values for format config
HYLcool f78b6e7
* convert experience to dict before converting to dataset
HYLcool 979ab5a
* fix conversion bugs in dataset
HYLcool e359179
* fix bugs
HYLcool d9d4773
* update configs of exp_pipeline
HYLcool d9501cf
+ init ray in the same namespace for data processor
HYLcool d16f0a8
* update example docs for experience pipeline
HYLcool d5e46f3
* after pre-commit
HYLcool a1cdc7f
Merge branch 'main' into feat/exp_pipeline
HYLcool a1b3b01
Merge branch 'main' into feat/exp_pipeline
HYLcool 55800f6
* fix dataset buffer logics and tests
HYLcool c10dd93
* update ray init method
HYLcool 17c91aa
* ignore dj configs when checking example validation
HYLcool 062722f
* move data processor related funcs to data/utils.py
HYLcool 974f3ab
* after pre-commit
HYLcool 60abb01
+ add missing docs
HYLcool 266ba19
+ fix typo and add infos about how to set api keys.
HYLcool 471a93d
* after pre-commit
HYLcool File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # GRPO on GSM8K dataset with Experience Pipeline | ||
|
|
||
| This example shows the usage of GRPO on the GSM8K dataset, with a experience pipeline to reshape the rewards of experiences while training. | ||
|
|
||
| For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_data_functionalities.md). | ||
|
|
||
| The config files are located in [`gsm8k.yaml`](gsm8k.yaml) and [`train_gsm8k.yaml`](train_gsm8k.yaml). |
11 changes: 11 additions & 0 deletions
11
examples/grpo_gsm8k_experience_pipeline/dj_scoring_exp.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| # This is a Data-Juicer data processing recipe | ||
| project_name: 'gsm-8k-experience-quality' | ||
|
|
||
| np: 32 | ||
|
|
||
| process: | ||
| - llm_quality_score_filter: | ||
| api_or_hf_model: "qwen2.5-32b-instruct" # use "qwen2.5-32b-instruct" to calculate the quality scores. | ||
| min_score: 0.0 | ||
| input_keys: ["prompt_text", "prompt_text"] # set input_keys and field_names to the existing key names in gsm-8k. Here calculating the difficulty scores according to both questions and answers. | ||
| field_names: ["prompt", "response"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| project: "Trinity-RFT-gsm8k-experience-pipeline" | ||
| name: "qwen2.5-1.5B-gsm8k-experience-pipeline" | ||
| checkpoint_root_dir: /PATH/TO/CHECKPOINT/ | ||
| algorithm: | ||
| algorithm_type: grpo | ||
| repeat_times: 8 | ||
| data_processor: | ||
| data_processor_url: 'http://127.0.0.1:5005/data_processor' | ||
| # experience pipeline related | ||
| experience_pipeline: | ||
| # I/O buffers | ||
| input_buffers: | ||
| - name: gsm8k_exp_output | ||
| output_buffer: | ||
| name: reshaped_gsm8k_exp_input | ||
| # format mapping | ||
| format: | ||
| reward_key: 'reward' # the key name of the reward in the experience | ||
| # data active iterator related | ||
| dj_config_path: 'examples/grpo_gsm8k_experience_pipeline/dj_scoring_exp.yaml' | ||
| clean_strategy: 'iterative' | ||
| # reward shaping | ||
| reward_shaping: | ||
| - stats_key: 'llm_quality_score' | ||
| op_type: ADD | ||
| weight: 1.0 | ||
|
|
||
| model: | ||
| model_path: /PATH/TO/MODEL/ | ||
| max_prompt_tokens: 256 | ||
| max_response_tokens: 1024 | ||
| cluster: | ||
| node_num: 1 | ||
| gpu_per_node: 8 | ||
| buffer: | ||
| total_epochs: 1 | ||
| batch_size: 96 | ||
| max_retry_times: 3 | ||
| max_retry_interval: 1 | ||
| explorer_input: | ||
| taskset: | ||
| name: gsm8k | ||
| storage_type: file | ||
| path: 'openai/gsm8k' | ||
| subset_name: 'main' | ||
| split: 'train' | ||
| format: | ||
| prompt_key: 'question' | ||
| response_key: 'answer' | ||
| rollout_args: | ||
| temperature: 1.0 | ||
| eval_tasksets: | ||
| - name: gsm8k-eval | ||
| storage_type: file | ||
| path: 'openai/gsm8k' | ||
| subset_name: 'main' | ||
| split: 'test' | ||
| format: | ||
| prompt_key: 'question' | ||
| response_key: 'answer' | ||
| default_workflow_type: 'math_workflow' | ||
| explorer_output: | ||
| name: gsm8k_exp_output | ||
| storage_type: queue | ||
| path: 'sqlite:///gsm8k_exp_output.db' | ||
| trainer_input: | ||
| experience_buffer: | ||
| name: reshaped_gsm8k_exp_input | ||
| storage_type: queue | ||
| path: 'sqlite:///reshaped_gsm8k_exp_input.db' | ||
| explorer: | ||
| eval_interval: 50 | ||
| runner_num: 32 | ||
| rollout_model: | ||
| engine_type: vllm_async | ||
| engine_num: 2 | ||
| tensor_parallel_size: 1 | ||
| enable_prefix_caching: false | ||
| enforce_eager: true | ||
| dtype: bfloat16 | ||
| seed: 42 | ||
| synchronizer: | ||
| sync_method: 'nccl' | ||
| sync_interval: 1 | ||
| sync_timeout: 1200 | ||
| trainer: | ||
| trainer_type: 'verl' | ||
| trainer_config_path: 'examples/grpo_gsm8k_experience_pipeline/train_gsm8k.yaml' | ||
| save_interval: 100 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,50 @@ | ||
| actor_rollout_ref: | ||
| hybrid_engine: True | ||
| model: | ||
| external_lib: null | ||
| override_config: { } | ||
| enable_gradient_checkpointing: True | ||
| use_remove_padding: True # False | ||
| actor: | ||
| strategy: fsdp # This is for backward-compatibility | ||
| ppo_mini_batch_size: 128 | ||
| ppo_micro_batch_size_per_gpu: 4 | ||
| use_dynamic_bsz: True # False | ||
| ppo_max_token_len_per_gpu: 16384 # n * ${data.max_prompt_length} + ${data.max_response_length} | ||
| grad_clip: 1.0 | ||
| ppo_epochs: 1 | ||
| shuffle: False | ||
| ulysses_sequence_parallel_size: 1 # sp size | ||
| optim: | ||
| lr: 1e-5 | ||
| lr_warmup_steps_ratio: 0. # the total steps will be injected during runtime | ||
| # min_lr_ratio: null # only useful for warmup with cosine | ||
| warmup_style: constant # select from constant/cosine | ||
| total_training_steps: -1 # must be override by program | ||
| fsdp_config: | ||
| wrap_policy: | ||
| # transformer_layer_cls_to_wrap: None | ||
| min_num_params: 0 | ||
| param_offload: False | ||
| optimizer_offload: False | ||
| fsdp_size: -1 | ||
| ref: | ||
| fsdp_config: | ||
| param_offload: False | ||
| wrap_policy: | ||
| # transformer_layer_cls_to_wrap: None | ||
| min_num_params: 0 | ||
| log_prob_micro_batch_size_per_gpu: 16 | ||
| log_prob_use_dynamic_bsz: ${actor_rollout_ref.actor.use_dynamic_bsz} | ||
| log_prob_max_token_len_per_gpu: ${actor_rollout_ref.actor.ppo_max_token_len_per_gpu} | ||
| ulysses_sequence_parallel_size: ${actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size | ||
|
|
||
| trainer: | ||
| balance_batch: True | ||
| # total_training_steps: null | ||
| # auto: find the last ckpt to resume. If can't find, start from scratch | ||
| resume_mode: auto # or auto or resume_path if | ||
| default_hdfs_dir: null | ||
| remove_previous_ckpt_in_save: False | ||
| del_local_ckpt_after_load: False | ||
| val_before_train: False |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.