Skip to content

Commit f4270d1

Browse files
fix(file-based): Switch Excel parser from calamine to openpyxl engine
Switch the Excel parser engine from calamine to openpyxl to prevent crashes when parsing Excel files with invalid date values. The calamine engine (Rust-based) panics when encountering date values that result in years outside Python's datetime range (1-9999), causing the entire sync to fail. The openpyxl engine (pure Python) handles these edge cases more gracefully, allowing syncs to complete even with data quality issues. This fixes crashes like: pyo3_runtime.PanicException: failed to construct date: PyErr { type: <class 'ValueError'>, value: ValueError('year 20225 is out of range') } Trade-off: openpyxl is slower than calamine, but reliability is more important than speed for production syncs. Fixes: airbytehq/oncall#10097 Co-Authored-By: unknown <>
1 parent 5d9125f commit f4270d1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

airbyte_cdk/sources/file_based/file_types/excel_parser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,4 +191,4 @@ def open_and_parse_file(fp: Union[IOBase, str, Path]) -> pd.DataFrame:
191191
Returns:
192192
pd.DataFrame: Parsed data from the Excel file.
193193
"""
194-
return pd.ExcelFile(fp, engine="calamine").parse() # type: ignore [arg-type, call-overload, no-any-return]
194+
return pd.ExcelFile(fp, engine="openpyxl").parse() # type: ignore [arg-type, call-overload, no-any-return]

0 commit comments

Comments
 (0)