Skip to content

Commit 06233cb

Browse files
peteryang1Xu
andauthored
feat: enhance timeout handling in CoSTEER and DataScience scenarios (#1150)
* add prev loops to runner history * fix evolving history * fix bug on initializing feedback without final decision * reformat * refine * add comments * feat: enhance timeout handling in CoSTEER and DataScience scenarios --------- Co-authored-by: Xu <[email protected]>
1 parent 8de9f75 commit 06233cb

File tree

6 files changed

+24
-6
lines changed

6 files changed

+24
-6
lines changed

rdagent/components/coder/CoSTEER/__init__.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ def develop(self, exp: Experiment) -> Experiment:
9292
# Evolving the solution
9393
start_datetime = datetime.now()
9494
fallback_evo_exp = None
95+
reached_max_seconds = False
9596
for evo_exp in self.evolve_agent.multistep_evolve(evo_exp, self.evaluator):
9697
assert isinstance(evo_exp, Experiment) # multiple inheritance
9798
if self._get_last_fb().is_acceptable():
@@ -103,6 +104,7 @@ def develop(self, exp: Experiment) -> Experiment:
103104
logger.info(f"evolving workspace: {sw}")
104105
if self.max_seconds is not None and (datetime.now() - start_datetime).seconds > self.max_seconds:
105106
logger.info(f"Reached max time limit {self.max_seconds} seconds, stop evolving")
107+
reached_max_seconds = True
106108
break
107109
if RD_Agent_TIMER_wrapper.timer.started and RD_Agent_TIMER_wrapper.timer.is_timeout():
108110
logger.info("Global timer is timeout, stop evolving")
@@ -111,13 +113,14 @@ def develop(self, exp: Experiment) -> Experiment:
111113
# if the final feedback is not finished(therefore acceptable), we will use the fallback solution.
112114
try:
113115
evo_exp = self._exp_postprocess_by_feedback(evo_exp, self._get_last_fb())
114-
except CoderError:
116+
except CoderError as e:
115117
if fallback_evo_exp is not None:
116118
logger.info("Fallback to the fallback solution.")
117119
evo_exp = fallback_evo_exp
118120
evo_exp.recover_ws_ckp() # NOTE: recovering checkpoints for restoring files in the workspace to prevent inplace mutation.
119121
else:
120-
raise
122+
e.caused_by_timeout = reached_max_seconds
123+
raise e
121124

122125
exp.sub_workspace_list = evo_exp.sub_workspace_list
123126
exp.experiment_workspace = evo_exp.experiment_workspace

rdagent/components/coder/data_science/pipeline/prompts.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,13 @@ pipeline_coder:
112112
```
113113
In debug mode, your code should run faster, so the environment will set a shorter time limit than the standard time limit for your code.
114114
For example, you can sample ten percent of the training data and run for one epoch, then the full run with ten epochs will take one hundred times the time taken for the debug run. The scale is calculated by yourself depending on the data sampling and epoch number you choose. If your full run enables early stopping, the scale should be smaller considering the early stopping will stop the training earlier than the full epochs.
115-
Be careful about the train-valid split strategy. StratifiedShuffleSplit is highly risk since the data has some categories with only one sample. If you use StratifiedShuffleSplit, you should consider using a try-except block to catch the error and use a different split strategy if the error occurs.
115+
Be careful about the train-valid split strategy. StratifiedShuffleSplit is highly risk since the data has some categories with only one sample. If you use StratifiedShuffleSplit, you should consider using a try-except block to catch the error and use a different split strategy if the error occurs. Example code:
116+
```python
117+
try:
118+
fold_indices = StratifiedKFold(...).split(train_X, train_y) or StratifiedShuffleSplit(...).split(train_X, train_y)
119+
except Exception as e:
120+
fold_indices = KFold(...).split(train_X, train_y) or other split strategy
121+
```
116122
You should sample the data after train valid split. When you split the data after sampling, you might get a class with only one sample which might cause the split strategy to fail.
117123
Your debug code should run exactly the same as the full run, except for the data sampling and epoch number, to ensure the correctness of the code.
118124
You should print total time and estimated time in standard output using print function in the following schema:

rdagent/core/exception.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ class CoderError(WorkflowError):
2020
"""
2121

2222
# NOTE: it corresponds to the error of **component**
23+
caused_by_timeout: bool = False # whether the error is caused by timeout
2324

2425

2526
class CodeFormatError(CoderError):

rdagent/scenarios/data_science/loop.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,15 @@ def record(self, prev_out: dict[str, Any]):
221221
else:
222222
exp: DSExperiment = prev_out["direct_exp_gen"] if isinstance(e, CoderError) else prev_out["coding"]
223223
# TODO: distinguish timeout error & other exception.
224-
if isinstance(self.trace.scen, DataScienceScen) and DS_RD_SETTING.allow_longer_timeout:
224+
if (
225+
isinstance(self.trace.scen, DataScienceScen)
226+
and DS_RD_SETTING.allow_longer_timeout
227+
and isinstance(e, CoderError)
228+
and e.caused_by_timeout
229+
):
230+
logger.info(
231+
f"Timeout error occurred: {e}. Increasing timeout for the current scenario from {self.trace.scen.timeout_increase_count} to {self.trace.scen.timeout_increase_count + 1}."
232+
)
225233
self.trace.scen.increase_timeout()
226234

227235
# set the local selection to the trace as global selection, then set the DAG parent for the trace

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ task_gen:
430430
9. **Preferred Packages Notes**:
431431
- You can choose the most proper packages for the task to best achieve the hypothesis.
432432
- When facing a choice between two packages which both can achieve the same goal, you should choose the one which is more commonly used and less likely to cause bugs in coding. Especially those you are not familiar with.
433-
- For GBDT models, prefer XGBoost or RandomForest over LightGBM unless the SOTA or hypothesis dictates otherwise.
433+
- For GBDT models, prefer XGBoost or RandomForest over LightGBM unless the SOTA or hypothesis dictates otherwise. Prefer not using GPU for GBDT models unless the SOTA or hypothesis dictates otherwise.
434434
- For neural networks, prefer PyTorch or PyTorch based library (over TensorFlow) unless the SOTA or hypothesis dictates otherwise.
435435
- For neural networks, prefer fine-tuning pre-trained models over training from scratch.
436436

rdagent/scenarios/data_science/scen/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ def __init__(self, competition: str) -> None:
6060
self.metric_direction: bool = (
6161
self._get_direction()
6262
) # True indicates higher is better, False indicates lower is better
63+
self.timeout_increase_count = 0
6364

6465
def reanalyze_competition_description(self):
6566
self.raw_description = self._get_description()
@@ -114,7 +115,6 @@ def _analysis_competition_description(self):
114115
self.longer_time_limit_required = response_json_analysis.get(
115116
"Longer time limit required", False
116117
) # True or False, whether the competition scenario requires a longer time limit to the code.
117-
self.timeout_increase_count = 0
118118

119119
def real_debug_timeout(self):
120120
return (

0 commit comments

Comments
 (0)