Skip to content

BUG: TypeError: object of type 'int' has no len() when saving DataFrame with object dtype columnΒ #34645

@Honzys

Description

@Honzys
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd

df = pd.DataFrame({"a": [None, None]})
df.loc[0, "a"] = float(1)
df.loc[1, "a"] = float(2)

hdf = pd.HDFStore("test.h5", write_mode="w")
hdf.put("table", df, format="table")

This causes following error:

  ...
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1042, in put
    errors=errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 1709, in _write_to_group
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4143, in write
    data_columns=data_columns,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 3813, in _create_axes
    errors=self.errors,
  File "/opt/venv/lib64/python3.6/site-packages/pandas/io/pytables.py", line 4800, in _maybe_convert_for_string_atom
    for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()

Problem description

After initial creation of DataFrame the dtype is of object dtype. After putting float in the a column I would expect that the dtype of the a column will change to float64 dtype, but it remains object dtype. The problem is that the type of df.loc[0, "a"] is float during saving the DataFrame, which causes the problem pasted above.

Expected Output

I would expect one of the following:

  • Implicit conversion of the column to float dtype
  • Conversion during hdf.put()
  • Proper exception saying that I am saving mixed typed column

There's a pretty big chance that I am wrong and this is expected behaviour. If that's the case, please, can you explain me why, or point me to somewhere, so that I can read something about it?

Maybe it's linked with this issue #34274

Output of pd.show_versions()

commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-147.5.1.el8_1.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.0.4
numpy : 1.16.4
pytz : 2018.7
dateutil : 2.8.1
pip : 19.3.1
setuptools : 46.4.0
Cython : 0.29.2
pytest : 5.1.2
hypothesis : None
sphinx : 1.8.4
blosc : None
feather : None
xlsxwriter : 1.1.2
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 2.0.0
numexpr : 2.6.8
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.1.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.1.2
numba : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions