You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
stdout+=f"\n### Submission check:\n{submission_check_out}\nIf Submission check returns a 'Submission is valid' or similar message, despite some warning messages, you should still consider the submission as valid and give a positive final decision. "
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/dev/runner/prompts.yaml
+12-31Lines changed: 12 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -18,52 +18,35 @@ DSCoSTEER_eval:
18
18
The code is focusing on the following task
19
19
{{ task_desc }}
20
20
21
-
## Evaluation Criteria
21
+
## Evaluation Guidelines
22
22
1. Evaluate the code base based on several aspects, including execution correctness, return checking, and code quality.
23
23
2. Ensure the code does not contain any incorrect, fabricated, or deceptive operations, such as mocking data, scores, or results.
24
24
3. Confirm that the prediction file (`submission.csv`) is generated using only the test dataset, and its format matches the sample submission. Please refer to Submission check section including the format check to the submission.
25
-
If the code does not satisfy any of the criteria:
25
+
If the code does not satisfy the requirements:
26
26
- Set "acceptable" to false.
27
-
If the code satisfy all the criteria:
27
+
If the code satisfy the requirements:
28
28
- Set "acceptable" to true.
29
29
30
30
{% if enable_hyperparameter_tuning_check %}
31
31
# Evaluation 2: Hyperparameter
32
+
## Evaluation Description
32
33
The user will provide you the time spent on the whole code execution and the timeout of the code execution. You should decide whether the hyperparameter is reasonable based on the time.
33
34
For example, if the code uses only a very small portion of the allowed time, and hyperparameters like `n_estimators` or `epochs` have low values, with early stopping not being triggered and possible signs of underfitting, you should suggest increasing these hyperparameters.
34
35
You should also notice other resources utilization hyper-parameters.
35
36
For example, if you are using a GPU with large memory, and the batch size is set very low, you should suggest increasing the batch size if it is not reasonable.
36
37
37
-
## Evaluation Criteria
38
-
1. The code execution time or resource utilization is under-utilized, which suggests that there is room for improvement in the hyperparameter
39
-
2. The code must already applied early stopping strategy to prevent overfitting and the early stopping was not triggered (otherwise, increasing epochs will be wasted).
38
+
## Evaluation Guidelines
39
+
1. The code execution time or resource utilization suggest that there is room for improvement in the hyperparameters.
40
+
2. The code must apply early stopping strategy already (in order to prevent overfitting).
40
41
3. Your suggestion should have a strong chance of improving the model's performance. Focus on the most obvious and impactful opportunities for quick improvement by leveraging more training time. Don't explore hyperparameters with low confidence. If there are no obvious and impactful opportunities and the code runs well, please accept it.
41
42
4. Only include the suggestions in your response without leak any time limit information because the user might over-fit the model to the time limit.
42
43
5. Never make your judgment only based on the time spent, you should also consider the code and the stdout.
43
-
44
-
In the "reasoning", provide clear, step-by-step reasoning for your hyperparameter tuning evaluation. Explicitly reference the code, stdout, and resource usage to justify your assessment. Ensure your reasoning checks whether all evaluation criteria are satisfied, and highlight any specific observations that support your decision.
45
-
If the code does not satisfy any of the criteria:
46
-
- Set "hyperparameter_tuning_decision" to false.
47
-
- Set "hyperparameter_tuning_suggestion" to an empty string.
48
-
If the code satisfy all the criteria:
44
+
If the code satisfy the requirements:
49
45
- Set "hyperparameter_tuning_decision" to true.
50
46
- In "hyperparameter_tuning_suggestion", provide a clear, specific, and actionable suggestion. Begin with a concrete observation, then state a direct action to take. Do not use vague language, options, or uncertainty (avoid words like "A or B"). For example: "[Observation] The maximum number of epochs was reached, but the validation loss is still decreasing and early stopping was not activated. Only small portion of the allowed time was used. [Suggestion] Increase epochs to 100 to avoid underfitting and further improve model performance."
51
-
52
-
## Hyperparameter Tuning Guidelines
53
-
1. Task-specific Hyperparameters
54
-
- NLP: Check `max_len`, model size, learning rate, batch size. Suggest increases only if underfitting or low resource usage.
55
-
- CV: Check `image_size`, backbone size, batch size, learning rate, augmentation. Suggest increases if results are poor and resources under-used.
- If validation accuracy is low or loss is high, suggest increasing model size or layers if resources allow. Add regularization if overfitting.
59
-
3. Epochs
60
-
- If early stopping triggered, do not increase epochs. If not triggered and validation improves, suggest more epochs.
61
-
4. Batch Size
62
-
- If memory allows and batch size is low, suggest increasing. If OOM errors, suggest reducing.
63
-
5. Learning Rate
64
-
- If training is slow/underfitting, suggest increasing. If unstable, suggest decreasing.
65
-
6. Data Augmentation
66
-
- For CV/NLP, suggest tuning augmentation if overfitting or poor generalization.
47
+
If the code does not satisfy the requirements:
48
+
- Set "hyperparameter_tuning_decision" to false.
49
+
- Set "hyperparameter_tuning_suggestion" to an empty string.
67
50
{% endif %}
68
51
69
52
## Output format
@@ -74,9 +57,7 @@ DSCoSTEER_eval:
74
57
"return_checking": "Verify the generated files, particularly the submission file. Ensure that its format is valid",
75
58
"code": "Provide feedback on code quality, readability, and adherence to the given specifications.",
76
59
"acceptable": <true/false: if the solution has passed execution, return_checking, and code verification, then it is a valid solution and acceptable. Otherwise it is not acceptable.>,
77
-
{% if enable_hyperparameter_tuning_check %}
78
-
"reasoning": "Provide step-by-step reasoning for hyperparameter tuning evaluation.",
79
-
"hyperparameter_tuning_suggestion": <suggestion in plain text for hyperparameter tuning>,
60
+
{% if enable_hyperparameter_tuning_check %}"hyperparameter_tuning_suggestion": <suggestion in plain text for hyperparameter tuning>,
0 commit comments