fix:give rewrite the authority to delete hypo

Hoder-zyf · Hoder-zyf · commit f46135651d5d · 2025-08-08T08:42:59.000Z
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml b/rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml
@@ -300,13 +300,16 @@ hypothesis_rewrite:
     ## Task
     Transform each **original hypothesis and its critique** into a **single, specific, testable technical hypothesis** that can be implemented immediately.
     
+    You have the authority to delete hypotheses that you judge to be completely infeasible or unsuitable. Use this authority carefully and judiciously and ensure at least one hypothesis remains in your output.
+    
     ## Core Principles
     1. **Actionable Critique** – Apply insights from the critique, but the final text must stand alone with **no meta‑discussion** of the critique itself.
     2. **Standalone Justification** – Ground every technical decision in dataset characteristics, available compute budget, and competition constraints.
     3. **Decisive Specificity** – Remove all ambiguity; propose one clear action.
     4. **Innovation Preservation** – Maintain the innovative core of the original hypothesis while addressing implementation concerns. Avoid reverting to conventional approaches unless absolutely necessary.
     5. **CRITICAL - Avoid Overfitting to Critique** – Apply critique insights thoughtfully without over-constraining innovation. Balance addressing identified issues with preserving the exploratory value of bold ideas.
-    {% if enable_scale_check %}6. The user is currently working on a continuous exploration on the task. It's typical that we first try in small scale and in some certain point we will scale up the solution. 
+    6. **Hypothesis Deletion Authority** – You have the authority to delete hypotheses that you judge to be completely infeasible or unsuitable. Use your judgment, but ensure at least one hypothesis remains.
+    {% if enable_scale_check %}7. The user is currently working on a continuous exploration on the task. It's typical that we first try in small scale and in some certain point we will scale up the solution. 
     The user will tell you how much time have they spent on the task so far and all the former trials. You should consider whether to scale up the solution based on the current situation. You should put this conclusion in each hypothesis's appendix section.
     Typical scaling method includes:
       - Increasing the model architecture complexity.
diff --git a/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py b/rdagent/scenarios/data_science/proposal/exp_gen/proposal.py
@@ -780,19 +780,25 @@ def hypothesis_rewrite(
 
         improved_hypotheses_dict = json.loads(response)
 
-        # Validate that we have rewritten hypotheses for all original hypotheses
+        # Validate rewritten hypotheses (now allows deletion of hypotheses)
         expected_problems = set(hypothesis_dict.keys())
-        available_problems = set(  # The code snippet provided is a comment in Python. It appears to be
-            # a placeholder for a function or variable named
-            # `improved_hypotheses_dict`. The actual implementation of this
-            # function or variable is not provided in the code snippet.
-            improved_hypotheses_dict.keys()
-        )
+        available_problems = set(improved_hypotheses_dict.keys())
+
+        # Check if all available problems are valid (subset of expected)
+        if not available_problems.issubset(expected_problems):
+            unexpected_problems = available_problems - expected_problems
+            # Raise exception to trigger retry mechanism
+            raise ValueError(f"Rewrite response contains unexpected problems. Unexpected: {unexpected_problems}")
 
-        if not expected_problems.issubset(available_problems):
-            missing_problems = expected_problems - available_problems
+        # Check if at least one hypothesis remains
+        if len(available_problems) == 0:
             # Raise exception to trigger retry mechanism
-            raise ValueError(f"Rewrite response missing expected problems. Missing: {missing_problems}")
+            raise ValueError("Rewrite response deleted all hypotheses. At least one hypothesis must remain.")
+
+        # Log deleted hypotheses if any
+        deleted_problems = expected_problems - available_problems
+        if deleted_problems:
+            logger.info(f"Deleted {len(deleted_problems)} hypotheses during rewrite: {deleted_problems}")
 
         # Note: We don't preserve 'inspired' field from original hypotheses
         # because after critique and rewrite, the hypothesis may have changed significantly