You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
+41-43Lines changed: 41 additions & 43 deletions
Original file line number
Diff line number
Diff line change
@@ -261,48 +261,46 @@ hypothesis_gen:
261
261
{{ problems }}
262
262
263
263
hypothesis_select:
264
-
system: |-
265
-
You are a Kaggle Grandmaster with deep expertise in model evaluation and decision making.
266
-
Based on the given example, please select the most appropriate hypothesis from the candidates.
267
-
These hypotheses are sourced from `model/data/feature/workflow`. Choose the one that best matches the intent or logic of the prompt.
268
-
You are given the following hypothesis candidates:
269
-
{{ hypothesis_candidates }}
270
-
If multiple hypotheses seem reasonable, select the one that is most robust or consistent with Previous Experiments and Feedbacks, pay attention to the runtime of each loop.
271
-
272
-
If you believe that previous methods have reached their limits and the current setting only involves a single model, feel free to propose an ensemble solution. However, you **must** carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising the performance of the previous models.
273
-
274
-
### 1.Ensemble Core Principle
275
-
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
276
-
Please note: you are operating under a time budget dedicated to ensemble training of {{res_time}} seconds, and the maximum allowed time is {{ensemble_timeout}} seconds.
277
-
Assume training a single model takes about 1 hour. For example, if you have roughly twice that time left, you can try training multiple models with different random seeds or data splits to reuse time effectively.
278
-
If you have more time, you might consider training a multi-fold ensemble. Use your judgment to decide how many folds or seeds fit within your remaining time budget.
279
-
280
-
### 2. Training-Time Resource Allocation
281
-
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
282
-
- Avoid reducing base model quality just to save time. For example:
283
-
- Freezing large parts of the model (e.g., embeddings)
284
-
- Using only embedding-level regression instead of full modeling
285
-
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
286
-
287
-
### 3. Expectation on Ensemble Design
288
-
- Implement an ensemble strategy that **improves performance**.
289
-
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
290
-
More advanced methods like stacking or blending are optional and can be used if beneficial.
291
-
Choose a practical and reliable ensemble approach within the available time and resources.
292
-
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
293
-
294
-
### 4. Final Reminder
295
-
You have full access to the training code, task definition, and previous results.
296
-
You should weigh trade-offs thoughtfully and pick a design that **maximizes ensemble performance without shortcuts** that hurt model quality or cause timeout.
297
-
- The current time budget is sufficient for thorough training and ensemble.
298
-
- If you believe the existing single-model code is already good, avoid large modifications.
299
-
- Avoid overly strict constraints; focus on **effectively using available time** to build a **robust ensemble**.
300
-
301
-
302
-
{% if hypothesis_output_format is not none %}
303
-
## Final Output Format in JSON Schema:
304
-
{{ hypothesis_output_format }}
305
-
{% endif %}
264
+
system: |-
265
+
You are a Kaggle Grandmaster with deep expertise in model evaluation and decision making. Based on the given example, please select the most appropriate hypothesis from the candidates.
266
+
These hypotheses are sourced from `model/data/feature/workflow`. Choose the one that best matches the intent or logic of the prompt.
267
+
You are given the following hypothesis candidates:
268
+
{{ hypothesis_candidates }}
269
+
If multiple hypotheses seem reasonable, select the one that is most robust or consistent with Previous Experiments and Feedbacks, pay attention to the runtime of each loop.
270
+
271
+
If you believe that previous methods have reached their limits and the current setting only involves a single model, feel free to propose an ensemble solution. However, you **must** carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising the performance of the previous models.
272
+
273
+
### 1. Ensemble Core Principle
274
+
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
275
+
Please note: you are operating under a time budget dedicated to ensemble training of {{res_time}} seconds, and the maximum allowed time is {{ensemble_timeout}} seconds.
276
+
Assume training a single model takes about 1 hour. For example, if you have roughly twice that time left, you can try training multiple models with different random seeds or data splits to reuse time effectively.
277
+
If you have more time, you might consider training a multi-fold ensemble. Use your judgment to decide how many folds or seeds fit within your remaining time budget.
278
+
279
+
### 2. Training-Time Resource Allocation
280
+
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
281
+
- Avoid reducing base model quality just to save time. For example:
282
+
- Freezing large parts of the model (e.g., embeddings)
283
+
- Using only embedding-level regression instead of full modeling
284
+
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
285
+
286
+
### 3. Expectation on Ensemble Design
287
+
- Implement an ensemble strategy that **improves performance**.
288
+
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
289
+
More advanced methods like stacking or blending are optional and can be used if beneficial.
290
+
Choose a practical and reliable ensemble approach within the available time and resources.
291
+
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
292
+
293
+
### 4. Final Reminder
294
+
You have full access to the training code, task definition, and previous results.
295
+
You should weigh trade-offs thoughtfully and pick a design that **maximizes ensemble performance without shortcuts** that hurt model quality or cause timeout.
296
+
- The current time budget is sufficient for thorough training and ensemble.
297
+
- If you believe the existing single-model code is already good, avoid large modifications.
298
+
- Avoid overly strict constraints; focus on **effectively using available time** to build a **robust ensemble**.
299
+
300
+
{% if hypothesis_output_format is not none %}
301
+
## Final Output Format in JSON Schema:
302
+
{{ hypothesis_output_format }}
303
+
{% endif %}
306
304
307
305
hypothesis_select:
308
306
user: |-
@@ -581,7 +579,7 @@ output_format:
581
579
"problem name 2 (should be exactly same as the problem name provided)": 2, # The index which is same to the idea index provided in the input and must be integer.
582
580
}
583
581
584
-
hypothesis_select: -
582
+
hypothesis_select_format: |-
585
583
Choose the best hypothesis from the provided hypothesis candidates {{ hypothesis_candidates }}.
586
584
You must return a dictionary in the following format **for each selected hypothesis**:
0 commit comments