Evaluation bug: formulas written by openpyxl/pandas aren’t calculated, causing false negatives

**Description**
We found a tricky issue in the evaluation pipeline related to Excel formula calculation.

**What’s happening**
In some tasks, the model outputs correct Excel formulas. However, when the generated `.xlsx` file is created/modified via `openpyxl` or `pandas`, these libraries only write the workbook structure and the formula strings — they do **not** evaluate formulas like Excel does. As a result, the produced file often has no cached formula results.

During evaluation, our script reads cell values from the processed workbook. If the workbook hasn’t been recalculated by Excel, the evaluator may read empty/None (or stale) values even though the formula itself is correct, leading to incorrect “value mismatch” errors.

**Impact**
This causes systematic false negatives and makes the model’s final score look lower than it should be.

**How to reproduce (typical scenario)**

1. Generate an `.xlsx` with formulas using `openpyxl` or `pandas`.
2. Run the evaluator that reads computed values from the output file.
3. The evaluator reports mismatches because the formula results were never calculated/cached.
4. If you open the file in Excel (GUI) and save it (triggering calculation), the values become available and the evaluator passes.

**Expected behavior**
If a model outputs correct formulas, evaluation should not fail just because the workbook hasn’t been recalculated by Excel.

**Suggested fixes / options**

* Ensure formula evaluation before reading values (e.g., enforce recalculation via Excel automation, LibreOffice, or another calculation engine).
* Alternatively, change evaluation logic to validate formulas themselves (when appropriate) instead of relying only on cached computed values.
* At minimum, document this limitation and standardize a “recalculate then evaluate” step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluation bug: formulas written by openpyxl/pandas aren’t calculated, causing false negatives #31

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation bug: formulas written by openpyxl/pandas aren’t calculated, causing false negatives #31

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions