-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
Something about this parquet file form https://github.com/Parquet/parquet-compatibility/ causes an exception while reading with pyarrow 22.0.0:
import urllib.request
import pathlib
import pyarrow.parquet as pq
p = pathlib.Path("nation.impala.parquet")
if not p.exists():
urllib.request.urlretrieve(
"https://github.com/Parquet/parquet-compatibility/raw/master/parquet-testdata/impala/1.1.1-NONE/nation.impala.parquet",
p
)
pq.read_table(p)which raises with
Traceback (most recent call last):
File "/Users/toaugspurger/gh/dask/dask/bug.py", line 12, in <module>
pq.read_table(p)
File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1899, in read_table
return dataset.read(columns=columns, use_threads=use_threads,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1538, in read
table = self._dataset.to_table(
^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset.pyx", line 589, in pyarrow._dataset.Dataset.to_table
File "pyarrow/_dataset.pyx", line 3969, in pyarrow._dataset.Scanner.to_table
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Invalid number of indices: 0pyarrow 21.0.0 was able to read that file.
Component(s)
Parquet, Python
Reactions are currently unavailable