You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add previous runner loops to runner history (#1142)
* add prev loops to runner history
* fix evolving history
* fix bug on initializing feedback without final decision
* reformat
* refine
* add comments
* fix ci
* a little refinement
* fix CI
---------
Co-authored-by: Xu <[email protected]>
Co-authored-by: Xu Yang <[email protected]>
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/runner/prompts.yaml
+36-23Lines changed: 36 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -25,13 +25,10 @@ DSCoSTEER_eval:
25
25
3. Confirm that the prediction file (`submission.csv`) is generated using only the test dataset, and its format matches the sample submission.
26
26
If the code does not satisfy the requirements:
27
27
- Set "acceptable" to false.
28
-
- Set "final_decision" to false.
29
-
{% if enable_hyperparameter_tuning_check %}- set "hyperparameter_tuning_decision" to false.
30
-
- Set "hyperparameter_tuning_suggestion" to an empty string.
31
28
If the code satisfy the requirements:
32
29
- Set "acceptable" to true.
33
-
- Proceed to the next evaluation.
34
30
31
+
{% if enable_hyperparameter_tuning_check %}
35
32
# Evaluation 2: Hyperparameter
36
33
## Evaluation Description
37
34
The user will provide you the time spent on the whole code execution and the timeout of the code execution. You should decide whether the hyperparameter is reasonable based on the time.
@@ -45,8 +42,7 @@ DSCoSTEER_eval:
45
42
3. Your suggestion should have a strong chance of improving the model's performance. Focus on the most obvious and impactful opportunities for quick improvement by leveraging more training time. Don't explore hyperparameters with low confidence. If there are no obvious and impactful opportunities and the code runs well, please accept it.
46
43
If the code satisfy the requirements:
47
44
- Set "hyperparameter_tuning_decision" to true.
48
-
- Set "final_decision" to false.
49
-
- Provide a reasonable suggestion in "hyperparameter_tuning_suggestion". The "hyperparameter_tuning_suggestion" should begin with a clear observation, followed by your suggestion. For example: "[Observation] The maximum number of epochs was reached, but the validation loss is still going down and early stopping was not activated. Only 15% of the allowed time was used. [Suggestion] We recommend increasing epochs to 100 to avoid underfitting and further improve model performance."
45
+
- In "hyperparameter_tuning_suggestion", provide a clear, specific, and actionable suggestion. Begin with a concrete observation, then state a direct action to take. Do not use vague language, options, or uncertainty (avoid words like "A or B"). For example: "[Observation] The maximum number of epochs was reached, but the validation loss is still decreasing and early stopping was not activated. Only 15% of the allowed time was used. [Suggestion] Increase epochs to 100 to avoid underfitting and further improve model performance."
50
46
If the code does not satisfy the requirements:
51
47
- Set "hyperparameter_tuning_decision" to false.
52
48
- Set "hyperparameter_tuning_suggestion" to an empty string.
@@ -59,10 +55,11 @@ DSCoSTEER_eval:
59
55
"execution": "Describe whether the whole code base executed successfully and generating the final submission. Include any errors or issues encountered, and retain all error messages and traceback details.",
60
56
"return_checking": "Verify the generated files, particularly the submission file. Ensure that its format matches the sample submission",
61
57
"code": "Provide feedback on code quality, readability, and adherence to the given specifications.",
62
-
"acceptable": <true/false: if the solution has paased execution, return_checking, and code verification, then it is a valid solution and acceptable. Otherwise it is not acceptable.>,{% if enable_hyperparameter_tuning_check %}
58
+
"acceptable": <true/false: if the solution has passed execution, return_checking, and code verification, then it is a valid solution and acceptable. Otherwise it is not acceptable.>,
59
+
{% if enable_hyperparameter_tuning_check %}
63
60
"hyperparameter_tuning_decision": <true/false>,
64
-
"hyperparameter_tuning_suggestion": <suggestion in plain text for hyperparameter tuning>,{% endif %}
65
-
"final_decision": <true/false>,
61
+
"hyperparameter_tuning_suggestion": <suggestion in plain text for hyperparameter tuning>,
62
+
{% endif %}
66
63
}
67
64
```
68
65
{% else %}
@@ -101,28 +98,35 @@ DSCoSTEER_eval:
101
98
"acceptable": <true/false: if the solution has paased execution, return_checking, and code verification, then it is a valid solution and acceptable. Otherwise it is not acceptable.>,
102
99
{% if enable_hyperparameter_tuning_check %}"hyperparameter_tuning_decision": <true/false>,
103
100
"hyperparameter_tuning_suggestion": <suggestion in plain text for hyperparameter tuning>,{% endif %}
104
-
"final_decision": <true/false>,
105
101
}
106
102
```
107
103
{% endif %}
108
104
# NOTE: when is_sub_enabled == False, we don't have any checking about the return. So it is just placeholder currently
109
105
110
106
user: |-
111
-
# Code base
107
+
# Current Code base
112
108
{{ code }}
109
+
{% if change_summary is not none %}
110
+
# Current Code Change Summary
111
+
{{ change_summary }}{% endif %}
113
112
114
113
## Stdout of code execution and testing
115
114
{{ stdout }}
116
115
117
-
# The time spend on code execution and timeout
118
-
{{ time_spent }}
119
-
120
-
## The timeout of code execution
121
-
{{ timeout }}
122
-
123
-
## The percent of timeout used
124
-
{{ percent_of_timeout_used }}
125
-
116
+
## Execution time and timeout
117
+
The execution time for current code base: {{ time_spent }}.
118
+
The total timeout: {{ timeout }}.
119
+
The percent of timeout used: {{ percent_of_timeout_used }}.
120
+
121
+
{% if queried_former_failed_knowledge|length != 0 %}
122
+
# Evolving History
123
+
{% for former_failed_knowledge in queried_former_failed_knowledge %}## Attempt {{ loop.index }}:
0 commit comments