-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Write to HDF5
df.to_hdf('data.h5', key='df', mode='w')
then
$ pip freeze | grep pandas
pandas==2.3.3
$ h5dump -a /df/pandas_version data.h5
HDF5 "data.h5" {
ATTRIBUTE "pandas_version" {
DATATYPE H5T_STRING {
STRSIZE 6;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "0.15.2"
}
}
}Issue Description
When writing DataFrames to HDF5 files using to_hdf(), the pandas_version attribute stored in the file is hardcoded to "0.15.2" regardless of the actual pandas version being used.
import pandas as pd
# Create a simple DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Write to HDF5
df.to_hdf('data.h5', key='df', mode='w')
then
$ pip freeze | grep pandas
pandas==2.3.3
$ h5dump -a /df/pandas_version data.h5
HDF5 "data.h5" {
ATTRIBUTE "pandas_version" {
DATATYPE H5T_STRING {
STRSIZE 6;
STRPAD H5T_STR_NULLTERM;
CSET H5T_CSET_UTF8;
CTYPE H5T_C_S1;
}
DATASPACE SCALAR
DATA {
(0): "0.15.2"
}
}
}
The hardcoded version string appears in the source code as _version = '0.15.2'. This is also visible in the test suite, where test_versioning() explicitly asserts that the pandas_version should be "0.15.2":
assert store.root.a._v_attrs.pandas_version == "0.15.2"
assert store.root.b._v_attrs.pandas_version == "0.15.2"
assert store.root.df1._v_attrs.pandas_version == "0.15.2"
See: [test_store.py lines 225-228](
pandas/pandas/tests/io/pytables/test_store.py
Lines 225 to 227 in d81171b
| assert store.root.a._v_attrs.pandas_version == "0.15.2" | |
| assert store.root.b._v_attrs.pandas_version == "0.15.2" | |
| assert store.root.df1._v_attrs.pandas_version == "0.15.2" |
This mismatch can cause confusion when debugging compatibility issues with HDF5 files, as the stored version information is misleading.
Expected Behavior
The pandas_version attribute should reflect the actual pandas version used to create the file (e.g., "2.3.3" in this case).
Installed Versions
INSTALLED VERSIONS
commit : 9c8bc3e
python : 3.12.3
python-bits : 64
OS : Linux
OS-release : 6.8.0-86-generic
Version : #87-Ubuntu SMP PREEMPT_DYNAMIC Mon Sep 22 18:03:36 UTC 2025
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.3.3
numpy : 2.3.4
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 24.0
Cython : None
sphinx : None
IPython : 9.6.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.14.2
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.6
lxml.etree : None
matplotlib : 3.10.7
numba : 0.62.1
numexpr : 2.14.1
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.16.2
sqlalchemy : None
tables : 3.10.2
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.2
qtpy : None
pyqt5 : None