-
Notifications
You must be signed in to change notification settings - Fork 722
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Using wr.redshift.copy(df) of a pandas dataframe containing a list of strings, throws the following error:
awswrangler.exceptions.UnsupportedType: Unsupported Pyarrow type: large_list<element: large_string>
Similar to this issue, we should be able to support pyarrow's large_list dtype.
When the original pandas dataframe is an object dtype, the type is being coerced into a large_list, even though the largest list in the column is significantly less than 2**31. I'd think it makes sense to map the large_list into a regular Athena/Glue type of array.
How to Reproduce
import awswrangler as wr
wr.read_parquet_metadata("s3://path/to/parquet/with/large_list/dtype.parquet")Simply use a parquet file with a large_list dtype in one of the columns.
Expected behavior
Be able to copy a dataframe containing a column of large_list dtype into Redshift. To do this, large_list should be mapped to array
Your project
No response
Screenshots
No response
OS
Amazon Linux 2
Python version
3.10
AWS SDK for pandas version
3.11.0
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working