Skip to content

Evaluation bug: formulas written by openpyxl/pandas aren’t calculated, causing false negatives #31

@MichaelYang-lyx

Description

@MichaelYang-lyx

Description
We found a tricky issue in the evaluation pipeline related to Excel formula calculation.

What’s happening
In some tasks, the model outputs correct Excel formulas. However, when the generated .xlsx file is created/modified via openpyxl or pandas, these libraries only write the workbook structure and the formula strings — they do not evaluate formulas like Excel does. As a result, the produced file often has no cached formula results.

During evaluation, our script reads cell values from the processed workbook. If the workbook hasn’t been recalculated by Excel, the evaluator may read empty/None (or stale) values even though the formula itself is correct, leading to incorrect “value mismatch” errors.

Impact
This causes systematic false negatives and makes the model’s final score look lower than it should be.

How to reproduce (typical scenario)

  1. Generate an .xlsx with formulas using openpyxl or pandas.
  2. Run the evaluator that reads computed values from the output file.
  3. The evaluator reports mismatches because the formula results were never calculated/cached.
  4. If you open the file in Excel (GUI) and save it (triggering calculation), the values become available and the evaluator passes.

Expected behavior
If a model outputs correct formulas, evaluation should not fail just because the workbook hasn’t been recalculated by Excel.

Suggested fixes / options

  • Ensure formula evaluation before reading values (e.g., enforce recalculation via Excel automation, LibreOffice, or another calculation engine).
  • Alternatively, change evaluation logic to validate formulas themselves (when appropriate) instead of relying only on cached computed values.
  • At minimum, document this limitation and standardize a “recalculate then evaluate” step.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions