-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
1. Make an excel document using Microsoft Excel. Rename the document's extension to .zip from .xlsx and extract the data as a zip file. The file contents will be extracted in a new directory. Make a modification to the xl/styles.xml file to remove the name attribute from any instance of the cellStyles tag, so it looks something like this:
<cellStyles count="1"><cellStyle xfId="0" builtinId="0" /></cellStyles></styleSheet>
2. Select all the files within the directory and zip them back up, renaming the output zip file to .xlsx
3. Attempt to load the file as an ExcelFile object in pandas using the following code:
try:
e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
except Exception as exc:
print(exc)
try:
e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
except Exception as ex:
print(ex)
You should get something like:
-traceback
Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 367, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file, engine=pandas_engine)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
super().__init__(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__
raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 371, in create_data_frame_sheet_file_object
excel = pd.ExcelFile(path_or_buffer=excel_file)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
self._reader = self._engines[engine](
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
super().__init__(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__
self.book = self.load_workbook(self.handles.handle, engine_kwargs)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
return load_workbook(
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
reader.read()
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
apply_stylesheet(self.archive, self.wb)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
obj = desc.expected_type.from_tree(el)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
return cls(**attrib)
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__
self.name = name
File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__
raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>
And then you get:
PanicException: index out of bounds: the len is 0 but the index is 0
Issue Description
My company processes excel files, and we often encounter errors in the way a file is constructed. We raise specific errors in cases when the file is encrypted, when the file cannot be opened, when there are formatting errors, etc. so we can notify other team members or clients that something is wrong with their data source. I observed a Rust panic which is not caught when using pandas-calamine engine to load an excel sheet that has structural formatting errors. This is a concern because it doesn't appear there's any way to handle the exception in Python, and thus we cannot surface the right kind of error.
Not sure if this belongs as an issue on pandas or on pyO3 since the rust bindings are managed through that library...
Expected Behavior
I would expect a different kind of exception to be raised, one native to the Python environment. It doesn't matter what.
Installed Versions
pd.show_versions()
INSTALLED VERSIONS
commit : 0691c5c
python : 3.9.16
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Jul 5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 2.0.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : None
sqlalchemy : 2.0.38
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None