|
10 | 10 |
|
11 | 11 | <img src="https://gw.alicdn.com/imgextra/i2/O1CN01MO34b71y4VQnD3WRp_!!6000000006525-2-tps-1247-567.png" alt="Agentic workflows" width="700" /> |
12 | 12 |
|
13 | | -BOTS operates in a continuous loop of task selection, model training, and posterior updating. |
14 | | -(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$). |
15 | | -(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch. |
16 | | -For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead. |
| 13 | +BOTS operates in a continuous loop of task selection, model training, and posterior updating. |
| 14 | +(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$). |
| 15 | +(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch. |
| 16 | +For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead. |
17 | 17 | (3) **Posterior Updating**: Explicit and implicit evidence are fused using our generalized Bayesian update rule. |
18 | 18 |
|
19 | 19 | ### Usage |
@@ -58,12 +58,12 @@ If you find the repo helpful, please cite: |
58 | 58 | } |
59 | 59 |
|
60 | 60 | @misc{BOTS, |
61 | | - title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning}, |
| 61 | + title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning}, |
62 | 62 | author={Qianli Shen and Daoyuan Chen and Yilun Huang and Zhenqing Ling and Yaliang Li and Bolin Ding and Jingren Zhou}, |
63 | 63 | year={2025}, |
64 | 64 | eprint={2510.26374}, |
65 | 65 | archivePrefix={arXiv}, |
66 | 66 | primaryClass={cs.AI}, |
67 | | - url={https://arxiv.org/abs/2510.26374}, |
| 67 | + url={https://arxiv.org/abs/2510.26374}, |
68 | 68 | } |
69 | | -``` |
| 69 | +``` |
0 commit comments