Skip to content

Commit 54b2491

Browse files
committed
debug
1 parent f9ad85a commit 54b2491

File tree

2 files changed

+43
-45
lines changed

2 files changed

+43
-45
lines changed

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 41 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -261,48 +261,46 @@ hypothesis_gen:
261261
{{ problems }}
262262
263263
hypothesis_select:
264-
system: |-
265-
You are a Kaggle Grandmaster with deep expertise in model evaluation and decision making.
266-
Based on the given example, please select the most appropriate hypothesis from the candidates.
267-
These hypotheses are sourced from `model/data/feature/workflow`. Choose the one that best matches the intent or logic of the prompt.
268-
You are given the following hypothesis candidates:
269-
{{ hypothesis_candidates }}
270-
If multiple hypotheses seem reasonable, select the one that is most robust or consistent with Previous Experiments and Feedbacks, pay attention to the runtime of each loop.
271-
272-
If you believe that previous methods have reached their limits and the current setting only involves a single model, feel free to propose an ensemble solution. However, you **must** carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising the performance of the previous models.
273-
274-
### 1.Ensemble Core Principle
275-
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
276-
Please note: you are operating under a time budget dedicated to ensemble training of {{res_time}} seconds, and the maximum allowed time is {{ensemble_timeout}} seconds.
277-
Assume training a single model takes about 1 hour. For example, if you have roughly twice that time left, you can try training multiple models with different random seeds or data splits to reuse time effectively.
278-
If you have more time, you might consider training a multi-fold ensemble. Use your judgment to decide how many folds or seeds fit within your remaining time budget.
279-
280-
### 2. Training-Time Resource Allocation
281-
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
282-
- Avoid reducing base model quality just to save time. For example:
283-
- Freezing large parts of the model (e.g., embeddings)
284-
- Using only embedding-level regression instead of full modeling
285-
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
286-
287-
### 3. Expectation on Ensemble Design
288-
- Implement an ensemble strategy that **improves performance**.
289-
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
290-
More advanced methods like stacking or blending are optional and can be used if beneficial.
291-
Choose a practical and reliable ensemble approach within the available time and resources.
292-
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
293-
294-
### 4. Final Reminder
295-
You have full access to the training code, task definition, and previous results.
296-
You should weigh trade-offs thoughtfully and pick a design that **maximizes ensemble performance without shortcuts** that hurt model quality or cause timeout.
297-
- The current time budget is sufficient for thorough training and ensemble.
298-
- If you believe the existing single-model code is already good, avoid large modifications.
299-
- Avoid overly strict constraints; focus on **effectively using available time** to build a **robust ensemble**.
300-
301-
302-
{% if hypothesis_output_format is not none %}
303-
## Final Output Format in JSON Schema:
304-
{{ hypothesis_output_format }}
305-
{% endif %}
264+
system: |-
265+
You are a Kaggle Grandmaster with deep expertise in model evaluation and decision making. Based on the given example, please select the most appropriate hypothesis from the candidates.
266+
These hypotheses are sourced from `model/data/feature/workflow`. Choose the one that best matches the intent or logic of the prompt.
267+
You are given the following hypothesis candidates:
268+
{{ hypothesis_candidates }}
269+
If multiple hypotheses seem reasonable, select the one that is most robust or consistent with Previous Experiments and Feedbacks, pay attention to the runtime of each loop.
270+
271+
If you believe that previous methods have reached their limits and the current setting only involves a single model, feel free to propose an ensemble solution. However, you **must** carefully allocate the training and runtime budget to ensure the **ensemble logic is well-executed and evaluated**, without compromising the performance of the previous models.
272+
273+
### 1. Ensemble Core Principle
274+
Your goal is not just to tune individual models, but to build an **effective ensemble**. Make design decisions that lead to **strong overall ensemble performance**, not just strong base models.
275+
Please note: you are operating under a time budget dedicated to ensemble training of {{res_time}} seconds, and the maximum allowed time is {{ensemble_timeout}} seconds.
276+
Assume training a single model takes about 1 hour. For example, if you have roughly twice that time left, you can try training multiple models with different random seeds or data splits to reuse time effectively.
277+
If you have more time, you might consider training a multi-fold ensemble. Use your judgment to decide how many folds or seeds fit within your remaining time budget.
278+
279+
### 2. Training-Time Resource Allocation
280+
- You may use **multiple folds** if justified, but you must **ensure the full pipeline completes within runtime limits**.
281+
- Avoid reducing base model quality just to save time. For example:
282+
- Freezing large parts of the model (e.g., embeddings)
283+
- Using only embedding-level regression instead of full modeling
284+
- Using extreme simplifications like LoRA or tiny backbones if they degrade performance
285+
286+
### 3. Expectation on Ensemble Design
287+
- Implement an ensemble strategy that **improves performance**.
288+
This can be as simple as training the same model with different random seeds or data splits and averaging the outputs.
289+
More advanced methods like stacking or blending are optional and can be used if beneficial.
290+
Choose a practical and reliable ensemble approach within the available time and resources.
291+
- Consider the resource budget as a whole: a strong ensemble depends on both good base models and effective combination.
292+
293+
### 4. Final Reminder
294+
You have full access to the training code, task definition, and previous results.
295+
You should weigh trade-offs thoughtfully and pick a design that **maximizes ensemble performance without shortcuts** that hurt model quality or cause timeout.
296+
- The current time budget is sufficient for thorough training and ensemble.
297+
- If you believe the existing single-model code is already good, avoid large modifications.
298+
- Avoid overly strict constraints; focus on **effectively using available time** to build a **robust ensemble**.
299+
300+
{% if hypothesis_output_format is not none %}
301+
## Final Output Format in JSON Schema:
302+
{{ hypothesis_output_format }}
303+
{% endif %}
306304
307305
hypothesis_select:
308306
user: |-
@@ -581,7 +579,7 @@ output_format:
581579
"problem name 2 (should be exactly same as the problem name provided)": 2, # The index which is same to the idea index provided in the input and must be integer.
582580
}
583581
584-
hypothesis_select: -
582+
hypothesis_select_format: |-
585583
Choose the best hypothesis from the provided hypothesis candidates {{ hypothesis_candidates }}.
586584
You must return a dictionary in the following format **for each selected hypothesis**:
587585

rdagent/scenarios/data_science/proposal/exp_gen/proposal.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -751,7 +751,7 @@ def hypothesis_select_with_llm(self,
751751
hypothesis_candidates = hypothesis_candidates,
752752
res_time = res_time,
753753
ensemble_timeout = ensemble_timeout,
754-
hypothesis_output_format = T(".prompts_v2:output_format.hypothesis_select").r(hypothesis_candidates = hypothesis_candidates)
754+
hypothesis_output_format = T(".prompts_v2:output_format.hypothesis_select_format").r(hypothesis_candidates = hypothesis_candidates)
755755
)
756756

757757
user_prompt = T(".prompts_v2:hypothesis_select.user").r(
@@ -988,7 +988,7 @@ def gen(
988988
timer=timer)
989989

990990
if response_dict["component"] != "Ensemble":
991-
new_hypothesis = DSHypothesis(component=hypothesis_dict[response_dict["hypothesis"]]["component"].get("component", "Model"),hypothesis=response_dict["hypothesis"])
991+
new_hypothesis = DSHypothesis(component=hypothesis_dict[response_dict["hypothesis"]].get("component", "Model"),hypothesis=response_dict["hypothesis"])
992992
else:
993993
new_hypothesis = DSHypothesis(component=HypothesisComponent.Ensemble,hypothesis=response_dict["hypothesis"])
994994

0 commit comments

Comments
 (0)