You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
+36-24Lines changed: 36 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -35,14 +35,12 @@ feedback_problem:
35
35
system: |-
36
36
You are a Kaggle Grandmaster and expert ML engineer with deep expertise in statistics, machine learning, and competition optimization.
37
37
The user is improving a Kaggle competition implementation iteratively through traces where each new trace is modified from the current SOTA in the trace, not necessarily the immediate predecessor.
38
-
You will be given a competition scenario, trace history description, the current SOTA implementation and feedback.
38
+
You will be given a competition scenario, previous SOTA and failed experiments and feedbacks, and the current SOTA implementation and feedback.
39
39
Your task is to analyze the given information and extract the **Low-Level Problems** from the current SOTA implementation.
40
40
41
-
{% if not pipeline %}
42
41
## Low-Level Problems
43
42
### Definition
44
-
Low-level problems are specific and fine-grained technical, or methodological issues within one or more of the five components ('DataLoadSpec', 'FeatureEng', 'Model', 'Ensemble', 'Workflow') in the implementation.
45
-
{% endif %}
43
+
Low-level problems are specific and fine-grained technical, or methodological issues within the implementation.
46
44
47
45
### Specification
48
46
{{ problem_spec }}
@@ -54,10 +52,10 @@ feedback_problem:
54
52
# Scenario Description
55
53
{{ scenario_desc }}
56
54
57
-
Here's the former SOTA experiments and their feedbacks:
55
+
# Previous SOTA Experiments and Feedbacks:
58
56
{{ sota_exp_and_feedback_list_desc }}
59
57
60
-
Also, here's the former failed experiments and their feedbacks:
58
+
# Previous Failed Experiments and Feedbacks:
61
59
{{ failed_exp_and_feedback_list_desc }}
62
60
63
61
# Current SOTA Implementation
@@ -66,8 +64,8 @@ feedback_problem:
66
64
hypothesis_gen:
67
65
system: |-
68
66
You are a Kaggle Grandmaster and expert ML engineer with deep expertise in statistics, machine learning, and competition optimization.
69
-
The user is improving a Kaggle competition implementation iteratively through traces where each new trace is modified from the current SOTA in the trace, not necessarily the immediate predecessor.
70
-
You will be given a competition scenario, trace history description, the current SOTA implementation, and a list of identified problems.
67
+
The user is improving a Kaggl e competition implementation iteratively through traces where each new trace is modified from the current SOTA in the trace, not necessarily the immediate predecessor.
68
+
You will be given a competition scenario, previous SOTA and failed experiments and feedbacks, the current SOTA implementation and feedback, and a list of identified problems.
71
69
Your role involves two tasks:
72
70
1. **Hypothesis Proposal**: Propose testable hypotheses to address the identified problems.
73
71
2. **Hypothesis Evaluation**: Evaluate the proposed hypotheses across multiple dimensions.
@@ -82,11 +80,21 @@ hypothesis_gen:
82
80
Each hypothesis should focus on the whole pipeline.
83
81
{% endif %}
84
82
83
+
## Hypothesis Guidelines
84
+
Here are guidelines to aid your hypothesis proposal. You don't need to answer all the questions.
85
+
1. Problem Impact Analysis
86
+
- Assess how the identified problem affects the performance of the current SOTA implementation.
87
+
2. Lessons from Previous Experiments
88
+
- For persistent problem, analyze why previous experiments failed on this problem.
89
+
- Review why previous experiments failed to address the problem. Identify patterns, overlooked factors, or misaligned assumptions.
90
+
- Incorporate learnings from both failed and successful past experiments to ground your hypothesis in evidence.
91
+
3. Actionable Changes
92
+
- If the problem relates to time/memory constraints, suggest smaller model sizes or alternative algorithms with reduced complexity.
93
+
- If the problem involves underperforming models, propose removing or replacing models with significantly worse performance.
94
+
- If the problem relates to hyperparameter tuning, recommend a specific method or strategy for tuning.
95
+
85
96
## Hypothesis Specification
86
-
1. The hypothesis should be precise, testable, and directly actionable. Avoid general or vague statements. For example, "tuning a model" is too broad, whereas "increasing the learning rate to 0.1 in the LightGBM model will improve performance" is specific and actionable.
87
-
2. Each hypothesis should focus on a single direction per experiment. Avoid proposing multiple possibilities within the same hypothesis, such as "this may work in case A or case B." Research and development can be approached at different levels (shallow or deep), but each experimental loop should validate only one specific idea.
88
-
3. The hypothesis should based on current SOTA solution. The user will conduct experiments based on the SOTA solution to test whether the hypothesis improves performance in this specific competition.
89
-
4. For problems which you think are covered by the current SOTA implementation or by the former hypothesis, you should ignore that problem and not include it in your response. But you should not respond an empty hypothesis list.
97
+
{{ hypothesis_spec }}
90
98
91
99
92
100
# Task 2: Hypothesis Evaluation
@@ -96,22 +104,21 @@ hypothesis_gen:
96
104
Please score the proposed hypothesis from 1 to 10 for each of the following dimensions (where 1 means lowest and 10 means highest):
97
105
1. Problem-Hypothesis Alignment: How well the hypothesis addresses the identified problem.
98
106
2. Expected Impact: The estimated improvement after applying the hypothesis to current SOTA implementation.
99
-
3. Novelty: Degree of innovation compared to previous attempts.
107
+
3. Novelty: Degree of innovation compared to previous attempts. If the proposed hypothesis is very similar to previous experiments' hypothesis, assign low novelty score.
100
108
4. Feasibility: The ease of implementing the proposed hypothesis in the current SOTA implementation.
101
109
5. Risk-Reward Balance: The exploration-exploitation balance of the proposed hypothesis.
102
110
103
111
## Final Output Format in JSON Schema:
104
112
{{ hypothesis_output_format }}
105
113
106
-
107
114
user: |-
108
115
# Scenario Description
109
116
{{ scenario_desc }}
110
117
111
-
Here's the former SOTA experiments and their feedbacks:
118
+
# Previous SOTA Experiments and Feedbacks:
112
119
{{ sota_exp_and_feedback_list_desc }}
113
120
114
-
Also, here's the former failed experiments and their feedbacks:
121
+
# Previous Failed Experiments and Feedbacks:
115
122
{{ failed_exp_and_feedback_list_desc }}
116
123
117
124
# Current SOTA Implementation
@@ -133,7 +140,13 @@ task_gen:
133
140
## Specification
134
141
{{ task_specification }}
135
142
136
-
## [Partial Response Format 1] Task Output Format
143
+
## Task Design Guidelines
144
+
The task should be concise with several steps each only in a few sentences.
145
+
DON'T repeat the details which has already included in the SOTA code. If the SOTA code has covered the steps perfectly, you should not repeat the steps in detail.
146
+
You SHOULD NOT write any code in the task description.
147
+
148
+
149
+
## [Partial Response Format 1] Task Output Format:
137
150
{{ task_output_format }}
138
151
139
152
{% if workflow_check %}
@@ -163,36 +176,35 @@ specification:
163
176
problem: |-
164
177
1. The problem should be specific and fine-grained. Avoid general or vague statements.
165
178
2. The problem should technical or methodological. Focus on design and implementation flaws, not runtime errors.
179
+
166
180
hypothesis: |-
167
181
1. The hypothesis should be precise, testable, and directly actionable. Avoid general or vague statements. For example, "tuning a model" is too broad, whereas "increasing the learning rate to 0.1 in the LightGBM model will improve performance" is specific and actionable.
168
182
2. Each hypothesis should focus on a single direction per experiment. Avoid proposing multiple possibilities within the same hypothesis, such as "this may work in case A or case B." Research and development can be approached at different levels (shallow or deep), but each experimental loop should validate only one specific idea.
169
183
3. The hypothesis should based on current SOTA solution. The user will conduct experiments based on the SOTA solution to test whether the hypothesis improves performance in this specific competition.
170
184
171
-
172
185
output_format:
173
186
problem: |-
174
187
For each of the identified problem, you should strictly adhere to the following JSON schema.
175
188
Your final output should be a dict containing all the identified problem without anything else.
176
189
Please respond at most five problems considering the most valuable and recently not explored.
177
190
{
178
191
"problem name 1": {
179
-
"problem": "Description of the first issue",
180
-
"reason": "Brief explanation of why this is a problem, based on the feedback or inferred from provided materials."
192
+
"problem": "Description of the first issue in no more than three sentences.",
193
+
"reason": "Brief explanation of why this is a problem, based on the feedback or inferred from provided materials in no more than two sentences."
181
194
},
182
195
"problem name 2": {
183
-
"problem": "Description of the second issue",
184
-
"reason": "Brief explanation of why this is a problem, based on the feedback or inferred from provided materials."
196
+
"problem": "Description of the second issue in no more than three sentences.",
197
+
"reason": "Brief explanation of why this is a problem, based on the feedback or inferred from provided materials in no more than two sentences."
185
198
}
186
199
}
187
200
hypothesis: |-
188
201
For each of the identified problem, you should propose a hypothesis strictly following to the JSON schema. Your final output should be a dict containing all the proposed hypothesis.
189
202
{
190
203
"problem name 1": {
191
-
"observation": "The observation of the given scenario, data characteristics, or trace history.",
204
+
"reason": "Provide a clear, logical progression from problem identification to hypothesis formulation, grounded in evidence (e.g., trace history, domain principles, or competition constraints). Refer to the Hypothesis Guidelines for better understanding. Reason should be short with no more than two sentences.",
192
205
{% if not pipeline %}"component": "The component name that the hypothesis focus on. Must be one of ('DataLoadSpec', 'FeatureEng', 'Model', 'Ensemble', 'Workflow').",
193
206
{% else %}"component": "The component name that the hypothesis focus on. Must be 'Pipeline'.",
194
207
{% endif %}
195
-
"reason": "A brief explanation, also in one or two sentences, outlining the rationale behind the hypothesis. It should reference specific trends or failures from past experiments and explain how the proposed approach may address these issues.",
196
208
"hypothesis": "A concise, testable statement derived from previous experimental outcomes. Limit it to one or two sentences that clearly specify the expected change or improvement in the <component>'s performance.",
197
209
"evaluation": {
198
210
"alignment_score": "The alignment of the proposed hypothesis with the identified problem.",
0 commit comments