Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.toml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[tool.bumpversion]
current_version = "3.12.1"
current_version = "3.13.0"
commit = false
tag = false
tag_name = "{new_version}"
Expand Down
80 changes: 40 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,25 +94,25 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
## At scale
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.

Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.

## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)

- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html#resources)
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/about.html)
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html)
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#pypi-pip)
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#conda)
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#aws-lambda-layer)
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#aws-glue-python-shell-jobs)
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#aws-glue-pyspark-jobs)
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#amazon-sagemaker-notebook)
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#amazon-sagemaker-notebook-lifecycle)
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#emr)
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/install.html#from-source)
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/scale.html)
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/scale.html#getting-started)
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/scale.html#supported-apis)
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/scale.html#resources)
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
Expand Down Expand Up @@ -153,30 +153,30 @@ Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/scale.html) or h
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.12.1/api.html#distributed-ray)
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html)
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-s3)
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#aws-glue-catalog)
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-athena)
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-redshift)
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#postgresql)
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#mysql)
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#sqlserver)
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#oracle)
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#data-api-redshift)
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#data-api-rds)
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#opensearch)
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#aws-glue-data-quality)
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-neptune)
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#dynamodb)
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-timestream)
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-emr)
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-cloudwatch-logs)
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-chime)
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#amazon-quicksight)
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#aws-sts)
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#aws-secrets-manager)
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#global-configurations)
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.13.0/api.html#distributed-ray)
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)

Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.12.1
3.13.0
2 changes: 1 addition & 1 deletion awswrangler/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@

__title__: str = "awswrangler"
__description__: str = "Pandas on AWS."
__version__: str = "3.12.1"
__version__: str = "3.13.0"
__license__: str = "Apache License 2.0"
6 changes: 3 additions & 3 deletions awswrangler/_distributed.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def register(cls, name: EngineLiteral | None = None) -> None:
cls._registry.clear()

if engine_name == EngineEnum.RAY.value:
from awswrangler.distributed.ray._register import register_ray
from awswrangler.distributed.ray._register import register_ray # noqa: PLC0415

register_ray()

Expand All @@ -127,7 +127,7 @@ def initialize(cls, name: EngineLiteral | None = None) -> None:
with cls._lock:
engine_name = name or cls.get_installed().value
if engine_name == EngineEnum.RAY.value:
from awswrangler.distributed.ray import initialize_ray
from awswrangler.distributed.ray import initialize_ray # noqa: PLC0415

initialize_ray()
cls._initialized_engine = EngineEnum[engine_name.upper()]
Expand Down Expand Up @@ -187,7 +187,7 @@ def set(cls, name: EngineLiteral) -> None:

def _reload() -> None:
"""Reload Pandas proxy module."""
import awswrangler.pandas
import awswrangler.pandas # noqa: PLC0415

reload(awswrangler.pandas)

Expand Down
16 changes: 8 additions & 8 deletions awswrangler/athena/_read.py
Original file line number Diff line number Diff line change
Expand Up @@ -793,11 +793,11 @@ def read_sql_query(

**Related tutorial:**

- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are three approaches available through ctas_approach and unload_approach parameters:**
Expand Down Expand Up @@ -861,7 +861,7 @@ def read_sql_query(
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down Expand Up @@ -1140,11 +1140,11 @@ def read_sql_table(

**Related tutorial:**

- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/006%20-%20Amazon%20Athena.html>`_
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/019%20-%20Athena%20Cache.html>`_
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/021%20-%20Global%20Configurations.html>`_

**There are three approaches available through ctas_approach and unload_approach parameters:**
Expand Down Expand Up @@ -1208,7 +1208,7 @@ def read_sql_table(
/athena.html#Athena.Client.get_query_execution>`_ .

For a practical example check out the
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.12.1/
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.13.0/
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!


Expand Down
4 changes: 2 additions & 2 deletions awswrangler/catalog/_create.py
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,7 @@ def create_csv_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/014%20-%20Schema%20Evolution.html
sep
String of length 1. Field delimiter for the output file.
skip_header_line_count
Expand Down Expand Up @@ -1280,7 +1280,7 @@ def create_json_table(
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
Related tutorial:
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/014%20-%20Schema%20Evolution.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/014%20-%20Schema%20Evolution.html
serde_library
Specifies the SerDe Serialization library which will be used. You need to provide the Class library name
as a string.
Expand Down
1 change: 1 addition & 0 deletions awswrangler/distributed/ray/_register.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Ray and Modin registered methods (PRIVATE)."""
# ruff: noqa: PLC0415

from awswrangler._data_types import pyarrow_types_from_pandas
from awswrangler._distributed import MemoryFormatEnum, engine, memory_format
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Ray ArrowCSVDatasource Module."""
# ruff: noqa: PLC0415

from __future__ import annotations

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Ray PandasTextDatasink Module."""
# ruff: noqa: PLC0415

from __future__ import annotations

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
"""Ray ArrowCSVDatasource Module."""
# ruff: noqa: PLC0415

from __future__ import annotations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
and customized to ensure compatibility with AWS SDK for pandas behavior. Changes from the original implementation,
are documented in the comments and marked with (AWS SDK for pandas) prefix.
"""
# ruff: noqa: PLC0415

from __future__ import annotations

Expand Down
6 changes: 3 additions & 3 deletions awswrangler/opensearch/_write.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def _file_line_generator(path: str, is_json: bool = False) -> Generator[Any, Non

@_utils.check_optional_dependency(jsonpath_ng, "jsonpath_ng")
def _get_documents_w_json_path(documents: list[Mapping[str, Any]], json_path: str) -> list[Any]:
from jsonpath_ng.exceptions import JsonPathParserError
from jsonpath_ng.exceptions import JsonPathParserError # noqa: PLC0415

try:
jsonpath_expression = jsonpath_ng.parse(json_path)
Expand Down Expand Up @@ -232,7 +232,7 @@ def create_index(
body = None # type: ignore[assignment]

# ignore 400 cause by IndexAlreadyExistsException when creating an index
response: dict[str, Any] = client.indices.create(index, body=body, ignore=400)
response: dict[str, Any] = client.indices.create(index=index, body=body, ignore=400)
if "error" in response:
_logger.warning(response)
if str(response["error"]).startswith("MapperParsingException"):
Expand Down Expand Up @@ -268,7 +268,7 @@ def delete_index(client: "opensearchpy.OpenSearch", index: str) -> dict[str, Any

"""
# ignore 400/404 IndexNotFoundError exception
response: dict[str, Any] = client.indices.delete(index, ignore=[400, 404])
response: dict[str, Any] = client.indices.delete(index=index, ignore=[400, 404])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out this was a hallucination... the real issue was that cluster ran into shard limit

if "error" in response:
_logger.warning(response)
return response
Expand Down
6 changes: 3 additions & 3 deletions awswrangler/s3/_read_orc.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@


def _pyarrow_orc_file_wrapper(source: Any) -> "ORCFile":
from pyarrow.orc import ORCFile
from pyarrow.orc import ORCFile # noqa: PLC0415

try:
return ORCFile(source=source)
Expand Down Expand Up @@ -225,7 +225,7 @@ def read_orc(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down Expand Up @@ -384,7 +384,7 @@ def read_orc_table(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down
4 changes: 2 additions & 2 deletions awswrangler/s3/_read_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ def read_parquet(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down Expand Up @@ -651,7 +651,7 @@ def read_parquet_table(
must return a bool, True to read the partition or False to ignore it.
Ignored if `dataset=False`.
E.g ``lambda x: True if x["year"] == "2020" and x["month"] == "1" else False``
https://aws-sdk-pandas.readthedocs.io/en/3.12.1/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
https://aws-sdk-pandas.readthedocs.io/en/3.13.0/tutorials/023%20-%20Flexible%20Partitions%20Filter.html
columns
List of columns to read from the file(s).
validate_schema
Expand Down
Loading
Loading