Skip to content

"invalid dtype determination in get_concat_dtype" when concating dfs with certain columns #20597

@Mofef

Description

@Mofef

Code Sample, a copy-pastable example if possible

There might be a simpler minimal example, but I was already really struggeling to identify this problem and to find this example. The problem seems to be related to strings reappearing in different positions of the tuples, different length tuples and unequal sets of columns.

(btw. I'm aware of MultiIndex, I would like to convert the Index to MultiIndex after the concatenation)

items_a = [("b","e","c","a","b"),
         ("e","e","c","a","c"),
         ("e","a","c","a","d"),
         ("b","a","b","e"),
         ("e","b","a"),
         ("e","c","c","a")]
items_b = [("b","e","c","a","b"),
         ("a","a","d","b","d"),
         ("a","b","d","b","e"),
         ("c","b","c","a"),
         ("a","c","b"),
         ("a","d","d","b")]
df1=pd.DataFrame([range(6)], columns=items_a)
df2=pd.DataFrame([range(6)], columns=items_b)
pd.concat([df1, df2])

Problem description

This yields

AssertionError: invalid dtype determination in get_concat_dtype

Expected Output

Something similar to

df1.columns = [str(c) for c in df1.columns]
df2.columns = [str(c) for c in df2.columns]
pd.concat([df1, df2])

Output of pd.show_versions()

(same result with pandas=0.17.1)

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.4.0-116-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.utf8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.23.0.dev0+38.g6552718
pytest: 2.8.7
pip: 9.0.1
setuptools: 20.7.0
Cython: 0.23.4
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.5.0
sphinx: 1.3.6
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: 0.7.3
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.9999999
sqlalchemy: 1.0.11
pymysql: None
psycopg2: 2.6.1 (dt dec mx pq3 ext lo64)
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugReshapingConcat, Merge/Join, Stack/Unstack, Explode

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions