exampls/bots

ShenQianli · ShenQianli · commit a777b158cdf4 · 2025-11-05T18:22:18.000+08:00
diff --git a/examples/bots/README.md b/examples/bots/README.md
@@ -10,10 +10,10 @@
 
 <img src="https://gw.alicdn.com/imgextra/i2/O1CN01MO34b71y4VQnD3WRp_!!6000000006525-2-tps-1247-567.png" alt="Agentic workflows" width="700" />
 
-BOTS operates in a continuous loop of task selection, model training, and posterior updating.  
-(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$).  
-(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch.  
-For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead.  
+BOTS operates in a continuous loop of task selection, model training, and posterior updating.
+(1) **Selection**: Thompson sampling from the posterior beliefs selects a batch of tasks whose estimated success probabilities are near a target difficulty (e.g., $p^*=0.5$).
+(2) **Training \& Evidence Collection**: The LLM is finetuned, yielding direct success/failure counts (_explicit evidence_) for the selected batch.
+For unselected tasks, predicted counts (_implicit evidence_) are produced by a plug-in; We introduce an ultra-lightweight interpolation-based variant with negligible overhead.
 (3) **Posterior Updating**: Explicit and implicit evidence are fused using our generalized Bayesian update rule.
 
 ### Usage
@@ -58,12 +58,12 @@ If you find the repo helpful, please cite:
 }
 
 @misc{BOTS,
-      title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning}, 
+      title={BOTS: A Unified Framework for Bayesian Online Task Selection in LLM Reinforcement Finetuning},
       author={Qianli Shen and Daoyuan Chen and Yilun Huang and Zhenqing Ling and Yaliang Li and Bolin Ding and Jingren Zhou},
       year={2025},
       eprint={2510.26374},
       archivePrefix={arXiv},
       primaryClass={cs.AI},
-      url={https://arxiv.org/abs/2510.26374}, 
+      url={https://arxiv.org/abs/2510.26374},
 }
-```
+```
diff --git a/examples/bots/bots.yaml b/examples/bots/bots.yaml
@@ -76,4 +76,4 @@ trainer:
   grad_clip: 1.0
   use_dynamic_bsz: true
   max_token_len_per_gpu: 24576
-  ulysses_sequence_parallel_size: 1
+  ulysses_sequence_parallel_size: 1
diff --git a/examples/bots/plugins/bots_math_boxed_reward.py b/examples/bots/plugins/bots_math_boxed_reward.py
@@ -5,6 +5,7 @@
 
 from .bots_reward import compute_score
 
+
 @REWARD_FUNCTIONS.register_module("bots_math_boxed_reward")
 class BOTSMathBoxedRewardFn(RewardFn):
     """A reward function that rewards for math task for BOTS."""
@@ -29,4 +30,4 @@ def __call__(  # type: ignore
         if with_think and not validate_think_pattern(response):
             format_score = (format_score_coef or 0.1) * -1.0
 
-        return {"accuracy": accuracy_score, "format_score": format_score}
+        return {"accuracy": accuracy_score, "format_score": format_score}
diff --git a/examples/bots/plugins/bots_math_boxed_workflow.py b/examples/bots/plugins/bots_math_boxed_workflow.py
@@ -3,6 +3,7 @@
 
 from .bots_math_boxed_reward import BOTSMathBoxedRewardFn
 
+
 @WORKFLOWS.register_module("bots_math_boxed_workflow")
 class BOTSMathBoxedWorkflow(MathBoxedWorkflow):
     """A workflow for math tasks that give answers in boxed format for BOTS."""
diff --git a/examples/bots/plugins/bots_reward.py b/examples/bots/plugins/bots_reward.py
diff --git a/examples/bots/random.yaml b/examples/bots/random.yaml