-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
from io import StringIO
from textwrap import dedent
csv = dedent("""
Site;Weight MD;CodeWeight;MAX SM;Pallet Equ.;Crate Equ.
BW08;0,24;2;999;0,03125;0,14286
BW08;0,24;2;999;0,03125;0,14286
BW08;0,24;2;999;0,03125;0,14286
BW01;0;0;999;0,00625;1
""")[1:]
csv_param = {
'decimal' : ',',
'sep' : ';',
'encoding' : "utf-8"
}
# Examples that fail
layout = {
'Site': 'string',
'Weight MD': "Float64",
'CodeWeight': "UInt8",
'MAX SM': "Float64",
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)
layout ={
'Site': 'string',
'Weight MD': pd.Float64Dtype,
'CodeWeight':pd.UInt8Dtype,
'MAX SM': pd.Int64Dtype,
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)
layout ={
'Site': 'string[pyarrow]',
'Weight MD': 'double[pyarrow]',
'CodeWeight':'int64[pyarrow]',
'MAX SM': 'int64[pyarrow]',
}
pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype=layout, **csv_param)
Error thrown :
ValueError: Unable to parse string "0,24" at position 0
But it can read :
data = pd.read_csv(StringIO(csv), usecols=layout.keys(), dtype_backend="numpy_nullable", **csv_param)
data.dtypes
Site string[pyarrow]
Weight MD Float64
CodeWeight Int64
MAX SM Int64
dtype: object
Issue Description
I think it's related to #49146, the dtype coercion does not work with nullable dtypes.
Expected Behavior
It should be able to read them with the requested dtype
Installed Versions
INSTALLED VERSIONS
commit : 478d340
python : 3.10.10.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19042
machine : AMD64
processor : AMD64 Family 23 Model 24 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_Netherlands.1252
pandas : 2.0.0
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : None
pytest : 7.2.2
hypothesis : None
sphinx : 6.1.3
blosc : None
feather : None
xlsxwriter : 3.0.9
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: 0.10.0
bs4 : 4.12.1
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : 1.0.10
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.3
qtpy : 2.3.1
pyqt5 : None