Skip to content

Commit fc0df6e

Browse files
Jensen246Hoder-zyf
andauthored
fix: ignore case when checking metric name (#1160)
* fix: ignore case when checking metric name * add case-sensitive to prompts --------- Co-authored-by: amstrongzyf <[email protected]>
1 parent ed04ba6 commit fc0df6e

File tree

6 files changed

+11
-12
lines changed

6 files changed

+11
-12
lines changed

rdagent/components/coder/data_science/pipeline/eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -140,8 +140,8 @@ def evaluate(
140140
if score_ret_code != 0:
141141
score_check_text += f"The dataframe in file 'scores.csv' is:\n{score_df}"
142142

143-
# Check metric name (columns)
144-
if score_df.columns.tolist() != [self.scen.metric_name]:
143+
# Check metric name (columns) - case insensitive
144+
if [col.lower() for col in score_df.columns.tolist()] != [self.scen.metric_name.lower()]:
145145
score_check_text += f"\n[Error] The scores dataframe does not contain the correct column names.\nCorrect columns is: ['{self.scen.metric_name}']\nBut got: {score_df.columns.tolist()}"
146146
score_ret_code = 1
147147

rdagent/components/coder/data_science/workflow/eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,8 @@ def evaluate(
105105
score_check_text += f"\n[Error] The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {model_set_in_folder.union({'ensemble'})}\nscore_df is:\n{score_df}"
106106
score_ret_code = 1
107107

108-
# Check metric name (columns)
109-
if score_df.columns.tolist() != [self.scen.metric_name]:
108+
# Check metric name (columns) - case insensitive
109+
if [col.lower() for col in score_df.columns.tolist()] != [self.scen.metric_name.lower()]:
110110
score_check_text += f"\n[Error] The scores dataframe does not contain the correct column names.\nCorrect columns is: ['{self.scen.metric_name}']\nBut got: {score_df.columns.tolist()}"
111111
score_ret_code = 1
112112

rdagent/scenarios/data_science/dev/runner/eval.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,8 +141,8 @@ def evaluate(
141141
score_check_text += f"\n[Error] The scores dataframe does not contain the correct model names as index.\ncorrect model names are: {model_set_in_folder.union({'ensemble'})}\nscore_df is:\n{score_df}"
142142
score_ret_code = 1
143143

144-
# Check metric name (columns)
145-
if score_df.columns.tolist() != [self.scen.metric_name]:
144+
# Check metric name (columns) - case insensitive
145+
if [col.lower() for col in score_df.columns.tolist()] != [self.scen.metric_name.lower()]:
146146
score_check_text += f"\n[Error] The scores dataframe does not contain the correct column names.\nCorrect columns is: ['{self.scen.metric_name}']\nBut got: {score_df.columns.tolist()}"
147147
score_ret_code = 1
148148

rdagent/scenarios/data_science/proposal/exp_gen/prompts_v2.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -424,7 +424,7 @@ task_gen:
424424
7. **Metric Calculation and Storage (`scores.csv`)**:
425425
- Calculate the official competition metric on a proper validation set. Save results to `scores.csv`.
426426
- The sketch must ensure this step is included. A successful run should always produce scores.
427-
- `scores.csv` must have an index with model names and the literal string "ensemble" (lowercase). **Columns should be a single column with exact metric name: "{{ metric_name }}".**
427+
- `scores.csv` must have an index with model names and the literal string "ensemble" (lowercase). **Columns should be a single column with exact metric name: "{{ metric_name }}".** (CASE-SENSITIVE)
428428
- When only one model is used, its score should be present, and an "ensemble" score (which would be the same as the single model's score in this case) must also be recorded.
429429
- Ensure validation metrics and processes are consistent across all parts of the pipeline. Avoid changes that would alter how validation metrics are calculated unless that is part of the hypothesis.
430430
8. **Submission File (`submission.csv`)**: Generate `submission.csv` in the **exact format** required (column names, order, data types), as detailed in the '====== Submission Format ======' section of the Competition Scenario Description (DO NOT read the sample_submission.csv file directly in the code). This is a critical step.
@@ -446,7 +446,7 @@ task_gen:
446446
447447
## CRITICAL OUTPUT FORMAT REQUIREMENTS
448448
Your sketch MUST explicitly specify the exact column structure for both output files:
449-
- **For `scores.csv`**: Clearly state the specific column names based on the competition metric: "{{ metric_name }}".
449+
- **For `scores.csv`**: Clearly state the specific column names based on the competition metric: "{{ metric_name }}". (CASE-SENSITIVE)
450450
- **For `submission.csv`**: Extract and explicitly list the exact column names from the Competition Scenario Description's '====== Submission Format ======' section
451451
- Do NOT use vague descriptions - provide the actual column names in your sketch.
452452

rdagent/scenarios/data_science/scen/prompts.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ scenario_description: |-
2424
- Do not manipulate data or return values solely to pass preliminary tests, as this will not lead to successful final evaluation.
2525
2626
====== Evaluation ======
27-
{% if metric_name %}The primary evaluation metric for this task is: **{{ metric_name }}**, **which should be the column name in `scores.csv`**.{% endif %}
27+
{% if metric_name %}The primary evaluation metric for this task is: **{{ metric_name }}**, **which should be the column name in `scores.csv` and the column name should be exactly the same as "{{ metric_name }}" (CASE-SENSITIVE)**.{% endif %}
2828
This metric is considered better when it is **{% if metric_direction %}larger{% else %}smaller{% endif %}**.
2929
3030
{% if evaluation is not none %}

rdagent/scenarios/data_science/share.yaml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -340,9 +340,8 @@ component_spec:
340340
- Calculate the metric (mentioned in the evaluation section of the competition information) for each model and ensemble strategy on valid, and save the results in `scores.csv`
341341
- The evaluation should be based on k-fold cross-validation but only if that's an appropriate evaluation for the task at hand. Store the mean validation score of k-fold cross-validation in `scores.csv` on each model. Refer to the hyperparameter specification for rules to set the CV folds.
342342
- Even if only one model is present, compute the ensemble score and store it under `"ensemble"`.
343-
- The index of `scores.csv` should include the model name and the "ensemble" strategy. "ensemble" should be exactly in the index with all lower case letters. Ensemble is the result from several models. If only one model is present, the ensemble score should be the same as the model score.
344-
- The column names in `scores.csv` should be ["{{ metric_name }}"] where metric_name is the name of the metric used for evaluation. Only one column is required.
345-
- The column name should be exactly the same to "{{ metric_name }}" since user will use it to pick the result.
343+
- The index of `scores.csv` should include the model name and the "ensemble" strategy. "ensemble" should be exactly in the index with all lower case letters (CASE-SENSITIVE). Ensemble is the result from several models. If only one model is present, the ensemble score should be the same as the model score.
344+
- The column names in `scores.csv` should be ["{{ metric_name }}"] where metric_name is the name of the metric used for evaluation. Only one column is required. The column name should be exactly the same as "{{ metric_name }}" (CASE-SENSITIVE) since user will use it to pick the result.
346345
- Validation metrics should be aligned across all ideas and implementations. Avoid proposing ideas that might affect the validation metrics and modifying the related code.
347346
348347
9. Submission File:

0 commit comments

Comments
 (0)