Skip to content

BUG: pd.read_csv does not work with nullable_dtype coercion #52594

@MCRE-BE

Description

@MCRE-BE

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from io import StringIO
from textwrap import dedent
csv = dedent("""
    Site;Weight MD;CodeWeight;MAX SM;Pallet Equ.;Crate Equ.
    BW08;0,24;2;999;0,03125;0,14286
    BW08;0,24;2;999;0,03125;0,14286
    BW08;0,24;2;999;0,03125;0,14286
    BW01;0;0;999;0,00625;1
""")[1:]
csv_param = {
    'decimal' : ',',
    'sep' : ';',
    'encoding' : "utf-8"
}

# Examples that fail
layout = {
    'Site': 'string',
    'Weight MD': "Float64",
    'CodeWeight': "UInt8",
    'MAX SM': "Float64",
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)

layout ={
    'Site': 'string',
    'Weight MD': pd.Float64Dtype,
    'CodeWeight':pd.UInt8Dtype,
    'MAX SM': pd.Int64Dtype,
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)

layout ={
    'Site': 'string[pyarrow]',
    'Weight MD': 'double[pyarrow]',
    'CodeWeight':'int64[pyarrow]',
    'MAX SM': 'int64[pyarrow]',
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)

Error thrown :

ValueError: Unable to parse string "0,24" at position 0

But it can read :

data = pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype_backend="numpy_nullable", **csv_param)
data.dtypes

Site          string[pyarrow]
Weight MD             Float64
CodeWeight              Int64
MAX SM                  Int64
dtype: object

Issue Description

I think it's related to #49146, the dtype coercion does not work with nullable dtypes.

Expected Behavior

It should be able to read them with the requested dtype

Installed Versions

INSTALLED VERSIONS

commit : 478d340
python : 3.10.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Netherlands.1252

pandas : 2.0.0
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : None
pytest : 7.2.2
hypothesis : None
sphinx : 6.1.3
blosc : None
feather : None
xlsxwriter : 3.0.9
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.1
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions