Skip to content

BUG: "Field type has incompatible types" in loading GCS parquet with read_parquet  #62451

@legiondean

Description

@legiondean

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import gcsfs
fs = gcsfs.GCSFileSystem(project="***")
path = 'gs://***/2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path, filesystem=fs)

Issue Description

When parquet files contain logical type like map, read_parquet behaviors differently on local parquet file and gcs file.
The same parquet file can be loaded as pandas df from local, but got error when load from gcs:
ArrowTypeError: Unable to merge: Field type has incompatible types: binary vs dictionary<values=string, indices=int32, ordered=0>

load from local file:

import pandas as pd
path = '2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path)
print("Row count:", len(df))
---------------------------------------------------------------------------
Row count: 327827

load from gcs:

import pandas as pd
import gcsfs
fs = gcsfs.GCSFileSystem(project="***")
path = 'gs://***/2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path, filesystem=fs)
---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
Cell In[6], line 5
      3 fs = gcsfs.GCSFileSystem(project="***")
      4 path = 'gs://****/2025-09-01T00_15_30_events.parquet'
----> 5 df = pd.read_parquet(path, filesystem=fs)

File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
    666     use_nullable_dtypes = False
    667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
    670     path,
    671     columns=columns,
    672     filters=filters,
    673     storage_options=storage_options,
    674     use_nullable_dtypes=use_nullable_dtypes,
    675     dtype_backend=dtype_backend,
    676     filesystem=filesystem,
    677     **kwargs,
    678 )

File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pandas/io/parquet.py:265, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
    258 path_or_handle, handles, filesystem = _get_path_or_handle(
    259     path,
    260     filesystem,
...
File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()

File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()

ArrowTypeError: Unable to merge: Field type has incompatible types: binary vs dictionary<values=string, indices=int32, ordered=0>

Expected Behavior

The read_parquet should be able to load the same GCS parquet file same as loading from local file.

Installed Versions

INSTALLED VERSIONS ------------------ commit : c888af6 python : 3.13.3 python-bits : 64 OS : Darwin OS-release : 24.6.0 Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:40 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6041 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : en_US.UTF-8

pandas : 2.3.1
numpy : 2.3.2
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.2
Cython : None
sphinx : None
IPython : 9.5.0
adbc-driver-postgresql: None
...
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIO NetworkLocal or Cloud (AWS, GCS, etc.) IO IssuesIO Parquetparquet, feather

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions