Skip to content

Commit 3265fe7

Browse files
authored
fix when score is a string (#750)
1 parent c077b82 commit 3265fe7

File tree

2 files changed

+15
-10
lines changed

2 files changed

+15
-10
lines changed

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ output_format:
200200
hypothesis: |-
201201
For each of the identified problem, you should propose a hypothesis strictly following to the JSON schema. Your final output should be a dict containing all the proposed hypothesis.
202202
{
203-
"problem name 1": {
203+
"problem name 1 (Should be exactly same as the problem name provided)": {
204204
"reason": "Provide a clear, logical progression from problem identification to hypothesis formulation, grounded in evidence (e.g., trace history, domain principles, or competition constraints). Refer to the Hypothesis Guidelines for better understanding. Reason should be short with no more than two sentences.",
205205
{% if not pipeline %}"component": "The component name that the hypothesis focus on. Must be one of ('DataLoadSpec', 'FeatureEng', 'Model', 'Ensemble', 'Workflow').",
206206
{% else %}"component": "The component name that the hypothesis focus on. Must be 'Pipeline'.",

rdagent/scenarios/data_science/proposal/exp_gen/proposal.py

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -326,15 +326,20 @@ def hypothesis_rank(self, hypothesis_dict: dict, problem_dict: dict, pipeline: b
326326
"feasibility_score": 0.1,
327327
"risk_reward_balance_score": 0.1,
328328
}
329-
scores = pd.DataFrame(
330-
{
331-
problem_name: {
332-
score_key: hypothesis_dict[problem_name]["evaluation"].get(score_key, 0) * weight
333-
for score_key, weight in weights.items()
334-
}
335-
for problem_name in hypothesis_dict
336-
}
337-
)
329+
scores_dict = {}
330+
for problem_name in hypothesis_dict:
331+
scores_dict[problem_name] = {}
332+
for score_key in weights:
333+
if score_key not in hypothesis_dict[problem_name]["evaluation"]:
334+
scores_dict[problem_name][score_key] = 0
335+
else:
336+
try:
337+
scores_dict[problem_name][score_key] = (
338+
float(hypothesis_dict[problem_name]["evaluation"][score_key]) * weights[score_key]
339+
)
340+
except (ValueError, TypeError):
341+
scores_dict[problem_name][score_key] = 0
342+
scores = pd.DataFrame(scores_dict)
338343
scores_sorted = scores.sum().sort_values(ascending=False)
339344
if len(scores_sorted) > 5:
340345
scores_sorted = scores_sorted[: len(scores_sorted) // 2]

0 commit comments

Comments
 (0)