Skip to content

BUG: Unhandled Rust panic when processing sheets with missing formatting data #60881

@kevinmccraneybp

Description

@kevinmccraneybp

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

1. Make an excel document using Microsoft Excel. Rename the document's extension to .zip from .xlsx and extract the data as a zip file. The file contents will be extracted in a new directory. Make a modification to the xl/styles.xml file to remove the name attribute from any instance of the cellStyles tag, so it looks something like this:

<cellStyles count="1"><cellStyle xfId="0" builtinId="0" /></cellStyles></styleSheet>

2. Select all the files within the directory and zip them back up, renaming the output zip file to .xlsx

3. Attempt to load the file as an ExcelFile object in pandas using the following code:

try:
	e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
except Exception as exc: 
	print(exc)
	try:
		e = pd.ExcelFile($YOUR_FILE_NAME, engine="openpyxl")
	except Exception as ex:
		print(ex)

You should get something like:

-traceback
Traceback (most recent call last):
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 367, in create_data_frame_sheet_file_object
    excel = pd.ExcelFile(path_or_buffer=excel_file, engine=pandas_engine)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
    self._reader = self._engines[engine](
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
    super().__init__(
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__
    self.book = self.load_workbook(self.handles.handle, engine_kwargs)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
    return load_workbook(
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
    reader.read()
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
    apply_stylesheet(self.archive, self.wb)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
    return cls(**attrib)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__
    self.name = name
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__
    raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/platform_flow_common/tasks/flows/preliminary/excel.py", line 371, in create_data_frame_sheet_file_object
    excel = pd.ExcelFile(path_or_buffer=excel_file)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 1567, in __init__
    self._reader = self._engines[engine](
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 553, in __init__
    super().__init__(
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_base.py", line 573, in __init__
    self.book = self.load_workbook(self.handles.handle, engine_kwargs)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/pandas/io/excel/_openpyxl.py", line 572, in load_workbook
    return load_workbook(
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 346, in load_workbook
    reader.read()
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/reader/excel.py", line 299, in read
    apply_stylesheet(self.archive, self.wb)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 198, in apply_stylesheet
    stylesheet = Stylesheet.from_tree(node)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/stylesheet.py", line 103, in from_tree
    return super(Stylesheet, cls).from_tree(node)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 87, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/serialisable.py", line 103, in from_tree
    return cls(**attrib)
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/styles/named_styles.py", line 229, in __init__
    self.name = name
  File "/Users/username/.local/share/virtualenvs/platform-excel-flow-v1_0-xHjD71YU/lib/python3.9/site-packages/openpyxl/descriptors/base.py", line 46, in __set__
    raise TypeError(msg)
TypeError: <class 'openpyxl.styles.named_styles._NamedCellStyle'>.name should be <class 'str'> but value is <class 'NoneType'>

And then you get:

PanicException: index out of bounds: the len is 0 but the index is 0

Issue Description

My company processes excel files, and we often encounter errors in the way a file is constructed. We raise specific errors in cases when the file is encrypted, when the file cannot be opened, when there are formatting errors, etc. so we can notify other team members or clients that something is wrong with their data source. I observed a Rust panic which is not caught when using pandas-calamine engine to load an excel sheet that has structural formatting errors. This is a concern because it doesn't appear there's any way to handle the exception in Python, and thus we cannot surface the right kind of error.

Not sure if this belongs as an issue on pandas or on pyO3 since the rust bindings are managed through that library...

Expected Behavior

I would expect a different kind of exception to be raised, one native to the Python environment. It doesn't matter what.

Installed Versions

pd.show_versions()

INSTALLED VERSIONS

commit : 0691c5c
python : 3.9.16
python-bits : 64
OS : Darwin
OS-release : 22.6.0
Version : Darwin Kernel Version 22.6.0: Wed Jul 5 22:22:05 PDT 2023; root:xnu-8796.141.3~6/RELEASE_ARM64_T6000
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.3
numpy : 2.0.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : 5.3.0
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : None
sqlalchemy : 2.0.38
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions