Releases: aws/aws-sdk-pandas
AWS Data Wrangler 1.9.0
Breaking changes
- Global configuration
s3fs_block_sizewas replaced bys3_block_size#370
New Functionalities
- Automatic recovery of Pandas indexes from Parquet files. #366
- Automatic recovery of Pandas time zones from Parquet files. #366
- Optional schema evolution disabling through the new
schema_evolutionargument. #353
Enhancements
s3fsdependency was replaced by builtin code. #370- Significant Amazon S3 I/O speed up for high latency environments (e.g. local, on-premises). #370
Bug Fix
Docs
- Few updates.
Thanks
We thank the following contributors/users for their work on this release:
@isrsal, @bppont, @weishao-aws, @alexifm, @Digma, @samcon, @TerrellV, @msantino, @alvaropc, @luigift, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel/egg files are available below. Just upload it and run!
AWS Data Wrangler 1.8.1
Bug Fix
- Fix NaN values handling for
wr.athena.read_sql_*(). #351
Docs
- Instructions for installation in AWS Glue PySpark Jobs. #46
Thanks
We thank the following contributors/users for their work on this release:
@czagoni, @josecw, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.8.0
New Functionalities
wr.s3.to_parquet()now hasmax_rows_by_fileargument. #283- Support for Unix path pattern matching (
*,?,[seq],[!seq]) for any list/read/delete/copy function on S3. #322
Enhancements
- Mypy applied with strict mode.
Bug Fix
- Fix unnecessary table versioning (glue catalog) creation for
wr.s3.to_parquet()during appends. #342 - Lack of sanitisation in indexes names for
wr.s3.to_parquet/csv(). #343
Docs
- New Who uses AWS Data Wrangler? section!!!
Thanks
We thank the following contributors/users for their work on this release:
@Thiago-Dantas, @andre-marcos-perez, @ericct, @marcelo-vilela, @edvorkin, @nicholas-miles, @chrispruitt, @rparthas ,@igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.7.0
Breaking changes
- The partitioned parquet reading now has a different approach for pushdown filters. For details check the tutorial
New Functionalities
- Global configuration module - TUTORIAL
- Concurrently partitions write - TUTORIAL
- Flexible Partitions Filter (PUSH-DOWN) - TUTORIAL
- Add Athena query metadata to Pandas DataFrames returned by
wr.athane.read_sql_*()- TUTORIAL #331 wr.athena.describe_table()#329wr.athena.show_create_table()#334- Add
path_ignore_suffixargument to all read functions #326
Enhancements
- Support for
PyArrow 1.0.0#337 - Support for
Pandas 1.1.0 - Support writing encrypted redshift copy manifest to S3 #327
wr.athane.read_sql_*()now accepts empty results #299- Allow connect_args to be passed when creating an SQL engine from a glue connection #309
- Add
skip_header_line_countargument towr.catalog.create_csv_table()#338
Bug Fix
- Add missing type annotations and fix types in docstrings. #321
- KeyError: 'StatementType' with Athena using max_cache_seconds #323
wr.s3.read_csv()slow with chunksize #324wr.s3.read_csv()with "chunksize" does not forward pandas_kwargs "encoding" #330- Ensure DataFrame mutability for
wr.athane.read_sql_*()w/ctas_approach=True#335
Docs
- Several small updates.
Thanks
We thank the following contributors/users for their work on this release:
@kylepierce, @davidszotten, @meganburger, @erikcw, @JPFrancoia, @zacharycarter, @DavideBossoli88, @c-line, @anand086, @jasadams, @mrtns, @schot, @koiker, @flaviomax, @bryanyang0528, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.3
New Functionalities
- Add
wr.catalog.get_partitions(). #305
Enhancements
- Improving Decimal casting.
Bug Fix
- Fix support for support for boto3 >= 1.14.18. 🐞 #315
Docs
- Add Spark Table Interoperability tutorial.
- General small updates.
Thanks
We thank the following contributors/users for their work on this release:
@jasadams, @bryanyang0528, @qemtek, @igorborgest.
P.S. Lambda Layer zip file and Glue wheel file are available below. Just upload it and run!
AWS Data Wrangler 1.6.2
Enhancements
- Now casting columns before append on an existing table only if necessary (
wr.s3.to_parquet()). - Add retry mechanism for InternalError on s3 object deletion.
- Add handling of immutable numpy arrays. (
flag.writeable==False)
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.1
Enhancements
- Casting support for any column type to string using
dtypeargument onwr.s3.to_parquet()
Bug Fix
- General bugs related to Athena Cache. 🐞
Docs
- General small updates.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.6.0
New Functionalities
- Amazon Athena CACHE 🚀 #285
- Initial AWS STS module
Enhancements
- Numpy 1.19.0
- Add
auto_createanddb_groupsarguments toget_redshift_temp_engine#288 - Add
validate_schemaarguments towr.s3.read_parquet_table - Add
safeargument toread_parquet#296 - Refactor naming of pandas kwargs #291
- Allow providing suffix to s3.store_parquet_metadata #295
- Add
last_modified_beginandlast_modified_begintolist_objects,read_csv,read_json,read_fwfandread_parquet
Bug Fix
- Fix bug on
get_table_descriptionon tables w/o description #294
Docs
- Add Athena cache tutorial.
Thanks
We thank the following contributors/users for their work on this release:
@koiker, @patrick-muller, @flaviomax, @acere, @jarretg, @bryanyang0528, @schrobot, @kinghuang, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.5.0
New Functionalities
- Amazon QuickSight support! 🎉
- Add create/delete database on wr.glue
Enhancements
- General improvements in the tutorials
- New Amazon S3 path check
- Add
sanitize_columnsarg for s3.to_parquet and s3.to_csv #278 #279 - Remove memory copy of DataFrame for to_parquet and to_csv
Bug Fix
- Force index=False for wr.db.to_sql() with redshift
Thanks
We thank the following contributors/users for their work on this release:
@ywang103, @patrick-muller, @tuliocasagrande, @sarojdongol, @sdknij, @ilyanoskov, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).
AWS Data Wrangler 1.4.0
New Functionalities
- Add support for reading CSV, JSON and FWF partitions. #265
Enhancements
- General improvement of moto tests
Bug Fix
- Fix
encodingarg support for reading CSV, JSON and FWF. #271
Thanks
We thank the following contributors/users for their work on this release:
@bryanyang0528, @dwbelliston, @patrick-muller, @sdknij, @igorborgest.
P.S. Lambda Layer's zip-file and Glue's wheel/egg are available below. Just upload it and run!
P.P.S. AWS Data Wrangler counts on compiled dependencies (C/C++) so there is no support for Glue PySpark by now (Only Glue Python Shell).