Skip to content

awswrangler.s3.read_parquet_metadata does not support large_list datatype #3085

@ashrielbrian

Description

@ashrielbrian

Describe the bug

Using wr.redshift.copy(df) of a pandas dataframe containing a list of strings, throws the following error:

awswrangler.exceptions.UnsupportedType: Unsupported Pyarrow type: large_list<element: large_string>

Similar to this issue, we should be able to support pyarrow's large_list dtype.

When the original pandas dataframe is an object dtype, the type is being coerced into a large_list, even though the largest list in the column is significantly less than 2**31. I'd think it makes sense to map the large_list into a regular Athena/Glue type of array.

How to Reproduce

import awswrangler as wr

wr.read_parquet_metadata("s3://path/to/parquet/with/large_list/dtype.parquet")

Simply use a parquet file with a large_list dtype in one of the columns.

Expected behavior

Be able to copy a dataframe containing a column of large_list dtype into Redshift. To do this, large_list should be mapped to array

Your project

No response

Screenshots

No response

OS

Amazon Linux 2

Python version

3.10

AWS SDK for pandas version

3.11.0

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions