-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Description
We found a tricky issue in the evaluation pipeline related to Excel formula calculation.
What’s happening
In some tasks, the model outputs correct Excel formulas. However, when the generated .xlsx file is created/modified via openpyxl or pandas, these libraries only write the workbook structure and the formula strings — they do not evaluate formulas like Excel does. As a result, the produced file often has no cached formula results.
During evaluation, our script reads cell values from the processed workbook. If the workbook hasn’t been recalculated by Excel, the evaluator may read empty/None (or stale) values even though the formula itself is correct, leading to incorrect “value mismatch” errors.
Impact
This causes systematic false negatives and makes the model’s final score look lower than it should be.
How to reproduce (typical scenario)
- Generate an
.xlsxwith formulas usingopenpyxlorpandas. - Run the evaluator that reads computed values from the output file.
- The evaluator reports mismatches because the formula results were never calculated/cached.
- If you open the file in Excel (GUI) and save it (triggering calculation), the values become available and the evaluator passes.
Expected behavior
If a model outputs correct formulas, evaluation should not fail just because the workbook hasn’t been recalculated by Excel.
Suggested fixes / options
- Ensure formula evaluation before reading values (e.g., enforce recalculation via Excel automation, LibreOffice, or another calculation engine).
- Alternatively, change evaluation logic to validate formulas themselves (when appropriate) instead of relying only on cached computed values.
- At minimum, document this limitation and standardize a “recalculate then evaluate” step.