Skip to content

Commit a777b15

Browse files
author
ShenQianli
committed
exampls/bots
1 parent 2750722 commit a777b15

File tree

6 files changed

+125
-106
lines changed

6 files changed

+125
-106
lines changed

examples/bots/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@
1010

1111
<img src="https://gw.alicdn.com/imgextra/i2/O1CN01MO34b71y4VQnD3WRp_!!6000000006525-2-tps-1247-567.png" alt="Agentic workflows" width="700" />
1212

13-
BOTS operates in a continuous loop of task selection, model training, and posterior updating.
14-
(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$).
15-
(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch.
16-
For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead.
13+
BOTS operates in a continuous loop of task selection, model training, and posterior updating.
14+
(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$).
15+
(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch.
16+
For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead.
1717
(3) **Posterior Updating**: Explicit and implicit evidence are fused using our generalized Bayesian update rule.
1818

1919
### Usage
@@ -58,12 +58,12 @@ If you find the repo helpful, please cite:
5858
}
5959
6060
@misc{BOTS,
61-
title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning},
61+
title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning},
6262
author={Qianli Shen and Daoyuan Chen and Yilun Huang and Zhenqing Ling and Yaliang Li and Bolin Ding and Jingren Zhou},
6363
year={2025},
6464
eprint={2510.26374},
6565
archivePrefix={arXiv},
6666
primaryClass={cs.AI},
67-
url={https://arxiv.org/abs/2510.26374},
67+
url={https://arxiv.org/abs/2510.26374},
6868
}
69-
```
69+
```

examples/bots/bots.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,4 +76,4 @@ trainer:
7676
grad_clip: 1.0
7777
use_dynamic_bsz: true
7878
max_token_len_per_gpu: 24576
79-
ulysses_sequence_parallel_size: 1
79+
ulysses_sequence_parallel_size: 1

examples/bots/plugins/bots_math_boxed_reward.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55

66
from .bots_reward import compute_score
77

8+
89
@REWARD_FUNCTIONS.register_module("bots_math_boxed_reward")
910
class BOTSMathBoxedRewardFn(RewardFn):
1011
"""A reward function that rewards for math task for BOTS."""
@@ -29,4 +30,4 @@ def __call__( # type: ignore
2930
if with_think and not validate_think_pattern(response):
3031
format_score = (format_score_coef or 0.1) * -1.0
3132

32-
return {"accuracy": accuracy_score, "format_score": format_score}
33+
return {"accuracy": accuracy_score, "format_score": format_score}

examples/bots/plugins/bots_math_boxed_workflow.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
from .bots_math_boxed_reward import BOTSMathBoxedRewardFn
55

6+
67
@WORKFLOWS.register_module("bots_math_boxed_workflow")
78
class BOTSMathBoxedWorkflow(MathBoxedWorkflow):
89
"""A workflow for math tasks that give answers in boxed format for BOTS."""

0 commit comments

Comments
 (0)