Skip to content

BUG: HDF5 file stores hardcoded pandas version "0.15.2" instead of actual pandas version #62792

@grzanka

Description

@grzanka

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Write to HDF5
df.to_hdf('data.h5', key='df', mode='w')


then


$ pip freeze | grep pandas
pandas==2.3.3

$ h5dump -a /df/pandas_version data.h5
HDF5 "data.h5" {
ATTRIBUTE "pandas_version" {
   DATATYPE  H5T_STRING {
      STRSIZE 6;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SCALAR
   DATA {
   (0): "0.15.2"
   }
}
}

Issue Description

When writing DataFrames to HDF5 files using to_hdf(), the pandas_version attribute stored in the file is hardcoded to "0.15.2" regardless of the actual pandas version being used.

import pandas as pd

# Create a simple DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Write to HDF5
df.to_hdf('data.h5', key='df', mode='w')

then

$ pip freeze | grep pandas
pandas==2.3.3

$ h5dump -a /df/pandas_version data.h5
HDF5 "data.h5" {
ATTRIBUTE "pandas_version" {
   DATATYPE  H5T_STRING {
      STRSIZE 6;
      STRPAD H5T_STR_NULLTERM;
      CSET H5T_CSET_UTF8;
      CTYPE H5T_C_S1;
   }
   DATASPACE  SCALAR
   DATA {
   (0): "0.15.2"
   }
}
}

The hardcoded version string appears in the source code as _version = '0.15.2'. This is also visible in the test suite, where test_versioning() explicitly asserts that the pandas_version should be "0.15.2":

assert store.root.a._v_attrs.pandas_version == "0.15.2"
assert store.root.b._v_attrs.pandas_version == "0.15.2"
assert store.root.df1._v_attrs.pandas_version == "0.15.2"

See: [test_store.py lines 225-228](

assert store.root.a._v_attrs.pandas_version == "0.15.2"
assert store.root.b._v_attrs.pandas_version == "0.15.2"
assert store.root.df1._v_attrs.pandas_version == "0.15.2"
)

This mismatch can cause confusion when debugging compatibility issues with HDF5 files, as the stored version information is misleading.

Expected Behavior

The pandas_version attribute should reflect the actual pandas version used to create the file (e.g., "2.3.3" in this case).

Installed Versions

INSTALLED VERSIONS

commit : 9c8bc3e
python : 3.12.3
python-bits : 64
OS : Linux
OS-release : 6.8.0-86-generic
Version : #87-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 22 18:03:36 UTC 2025
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.3.3
numpy : 2.3.4
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : 9.6.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.2
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : 3.10.7
numba : 0.62.1
numexpr : 2.14.1
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.16.2
sqlalchemy : None
tables : 3.10.2
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions