-
Notifications
You must be signed in to change notification settings - Fork 222
Dcagent #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Dcagent #795
Changes from all commits
f18c460
891916a
ca85618
dae1a02
39e42b3
7fdef1a
d68c31d
e44bc9d
1fe6c5a
f22fbc4
a086077
996b6a9
9948063
c493316
e162dd4
01c5a91
fb0728b
e0eb1d5
27d61e8
d7472ac
090f325
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,7 +2,11 @@ set -x | |||||||||||||
|
|
||||||||||||||
| # Running on policy distillation for Math on the DAPO math dataset, with eval on AIME 2024. | ||||||||||||||
| # Uses Qwen-3-1.7B-Base as the student model and an RL trained Qwen-3-4B as the teacher model | ||||||||||||||
| <<<<<<< HEAD | ||||||||||||||
| # uv run examples/algorithms/dapo/prepare_dapo_data.sh | ||||||||||||||
| ======= | ||||||||||||||
| # bash examples/algorithms/dapo/prepare_dapo_data.sh | ||||||||||||||
| >>>>>>> main | ||||||||||||||
|
Comment on lines
+5
to
+9
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file contains unresolved merge conflict markers (
Suggested change
|
||||||||||||||
| # bash examples/on_policy_distillation/run_on_policy_distill_math_qwen3_1.7b.sh | ||||||||||||||
|
|
||||||||||||||
| DATA_DIR="$HOME/data/dapo" | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -2,7 +2,11 @@ set -x | |||||||||||||
|
|
||||||||||||||
| # Running on policy distillation for Math on the DAPO math dataset, with eval on AIME 2024. | ||||||||||||||
| # Uses Qwen-3-4B-Base as the student model and an RL trained Qwen-3-4B as the teacher model | ||||||||||||||
| <<<<<<< HEAD | ||||||||||||||
| # uv run examples/algorithms/dapo/prepare_dapo_data.sh | ||||||||||||||
| ======= | ||||||||||||||
| # bash examples/algorithms/dapo/prepare_dapo_data.sh | ||||||||||||||
| >>>>>>> main | ||||||||||||||
|
Comment on lines
+5
to
+9
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file contains unresolved merge conflict markers (
Suggested change
|
||||||||||||||
| # bash examples/on_policy_distillation/run_on_policy_distill_math_qwen3_4b.sh | ||||||||||||||
|
|
||||||||||||||
| DATA_DIR="$HOME/data/dapo" | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,36 @@ | ||
| """ | ||
| Main entrypoint for training on terminal bench tasks. | ||
| """ | ||
| import ray | ||
| import hydra | ||
| from omegaconf import DictConfig | ||
| from skyrl_train.entrypoints.main_base import BasePPOExp, config_dir | ||
| from skyrl_train.utils import validate_cfg | ||
| from skyrl_train.utils.utils import initialize_ray | ||
| from examples.terminal_bench.terminal_bench_generator import TerminalBenchGenerator | ||
| from examples.terminal_bench.dataset import TerminalBenchTaskDataset | ||
| from examples.terminal_bench.entrypoints.main_tbench import TerminalBenchExp | ||
| from examples.on_policy_distillation.main_on_policy_distill import OnPolicyDistillationTrainer | ||
|
|
||
| class OnPolicyDistillationTerminalBenchExp(TerminalBenchExp): | ||
| def get_trainer(self, *args, **kwargs): | ||
| return OnPolicyDistillationTrainer(*args, **kwargs) | ||
|
|
||
|
|
||
| @ray.remote(num_cpus=1) | ||
| def skyrl_entrypoint(cfg: DictConfig): | ||
| # make sure that the training loop is not run on the head node. | ||
| exp = OnPolicyDistillationTerminalBenchExp(cfg) | ||
| exp.run() | ||
|
|
||
| @hydra.main(config_path=config_dir, config_name="ppo_base_config", version_base=None) | ||
| def main(cfg: DictConfig) -> None: | ||
| # validate the arguments | ||
| validate_cfg(cfg) | ||
|
|
||
| initialize_ray(cfg) | ||
| ray.get(skyrl_entrypoint.remote(cfg)) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() |
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file contains unresolved merge conflict markers (
<<<<<<<,=======,>>>>>>>). Please resolve them.