Skip to content

[C++][Python][Parquet] pyarrow.lib.ArrowInvalid: Invalid number of indices: 0 when reading a parquet file #47981

@TomAugspurger

Description

@TomAugspurger

Describe the bug, including details regarding any error messages, version, and platform.

Something about this parquet file form https://github.com/Parquet/parquet-compatibility/ causes an exception while reading with pyarrow 22.0.0:

import urllib.request
import pathlib
import pyarrow.parquet as pq

p = pathlib.Path("nation.impala.parquet")
if not p.exists():
    urllib.request.urlretrieve(
        "https://github.com/Parquet/parquet-compatibility/raw/master/parquet-testdata/impala/1.1.1-NONE/nation.impala.parquet",
        p
    )

pq.read_table(p)

which raises with

Traceback (most recent call last):
  File "/Users/toaugspurger/gh/dask/dask/bug.py", line 12, in <module>
    pq.read_table(p)
  File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1899, in read_table
    return dataset.read(columns=columns, use_threads=use_threads,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/toaugspurger/gh/dask/.venv/lib/python3.12/site-packages/pyarrow/parquet/core.py", line 1538, in read
    table = self._dataset.to_table(
            ^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_dataset.pyx", line 589, in pyarrow._dataset.Dataset.to_table
  File "pyarrow/_dataset.pyx", line 3969, in pyarrow._dataset.Scanner.to_table
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Invalid number of indices: 0

pyarrow 21.0.0 was able to read that file.

Component(s)

Parquet, Python

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions