Skip to content

Commit e3a7542

Browse files
authored
Merge branch 'main' into bug/fix-dataset-empty-table
2 parents 4996373 + 6fd1b3c commit e3a7542

35 files changed

+2407
-2495
lines changed

.bumpversion.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[tool.bumpversion]
2-
current_version = "3.10.1"
2+
current_version = "3.11.0"
33
commit = false
44
tag = false
55
tag_name = "{new_version}"

.github/workflows/cfn-nag.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
poetry env use python
5555
poetry env info
5656
source $(poetry env info --path)/bin/activate
57-
poetry install -vvv
57+
poetry install -vvv --no-root
5858
- name: Set up cdk.json
5959
run: |
6060
cd test_infra

.github/workflows/minimal-tests.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,11 @@ jobs:
1717
strategy:
1818
fail-fast: false
1919
matrix:
20-
python-version: ["3.8", "3.11", "3.12"]
20+
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
2121
platform: [ubuntu-latest, macos-latest, windows-latest]
22+
exclude:
23+
- python-version: 3.13
24+
platform: windows-latest
2225

2326
env:
2427
AWS_DEFAULT_REGION: us-east-1

README.md

Lines changed: 40 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -94,27 +94,25 @@ FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3
9494
## At scale
9595
AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers.
9696

97-
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
98-
99-
> ⚠️ **Ray is currently not available for Python 3.12. While AWS SDK for pandas supports Python 3.12, it cannot be used at scale.**
97+
Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more.
10098

10199
## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/)
102100

103-
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/about.html)
104-
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html)
105-
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#pypi-pip)
106-
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#conda)
107-
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#aws-lambda-layer)
108-
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#aws-glue-python-shell-jobs)
109-
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#aws-glue-pyspark-jobs)
110-
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#amazon-sagemaker-notebook)
111-
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#amazon-sagemaker-notebook-lifecycle)
112-
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#emr)
113-
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/install.html#from-source)
114-
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html)
115-
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html#getting-started)
116-
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html#supported-apis)
117-
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html#resources)
101+
- [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/about.html)
102+
- [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html)
103+
- [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#pypi-pip)
104+
- [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#conda)
105+
- [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#aws-lambda-layer)
106+
- [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#aws-glue-python-shell-jobs)
107+
- [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#aws-glue-pyspark-jobs)
108+
- [Amazon SageMaker Notebook](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#amazon-sagemaker-notebook)
109+
- [Amazon SageMaker Notebook Lifecycle](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#amazon-sagemaker-notebook-lifecycle)
110+
- [EMR](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#emr)
111+
- [From source](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/install.html#from-source)
112+
- [**At scale**](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/scale.html)
113+
- [Getting Started](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/scale.html#getting-started)
114+
- [Supported APIs](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/scale.html#supported-apis)
115+
- [Resources](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/scale.html#resources)
118116
- [**Tutorials**](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials)
119117
- [001 - Introduction](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/001%20-%20Introduction.ipynb)
120118
- [002 - Sessions](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/002%20-%20Sessions.ipynb)
@@ -155,30 +153,30 @@ Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/scale.html) or h
155153
- [039 - Athena Iceberg](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/039%20-%20Athena%20Iceberg.ipynb)
156154
- [040 - EMR Serverless](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/040%20-%20EMR%20Serverless.ipynb)
157155
- [041 - Apache Spark on Amazon Athena](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/041%20-%20Apache%20Spark%20on%20Amazon%20Athena.ipynb)
158-
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html)
159-
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-s3)
160-
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#aws-glue-catalog)
161-
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-athena)
162-
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-redshift)
163-
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#postgresql)
164-
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#mysql)
165-
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#sqlserver)
166-
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#oracle)
167-
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#data-api-redshift)
168-
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#data-api-rds)
169-
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#opensearch)
170-
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#aws-glue-data-quality)
171-
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-neptune)
172-
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#dynamodb)
173-
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-timestream)
174-
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-emr)
175-
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-cloudwatch-logs)
176-
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-chime)
177-
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#amazon-quicksight)
178-
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#aws-sts)
179-
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#aws-secrets-manager)
180-
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#global-configurations)
181-
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.10.1/api.html#distributed-ray)
156+
- [**API Reference**](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html)
157+
- [Amazon S3](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-s3)
158+
- [AWS Glue Catalog](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#aws-glue-catalog)
159+
- [Amazon Athena](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-athena)
160+
- [Amazon Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-redshift)
161+
- [PostgreSQL](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#postgresql)
162+
- [MySQL](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#mysql)
163+
- [SQL Server](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#sqlserver)
164+
- [Oracle](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#oracle)
165+
- [Data API Redshift](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#data-api-redshift)
166+
- [Data API RDS](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#data-api-rds)
167+
- [OpenSearch](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#opensearch)
168+
- [AWS Glue Data Quality](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#aws-glue-data-quality)
169+
- [Amazon Neptune](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-neptune)
170+
- [DynamoDB](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#dynamodb)
171+
- [Amazon Timestream](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-timestream)
172+
- [Amazon EMR](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-emr)
173+
- [Amazon CloudWatch Logs](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-cloudwatch-logs)
174+
- [Amazon Chime](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-chime)
175+
- [Amazon QuickSight](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#amazon-quicksight)
176+
- [AWS STS](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#aws-sts)
177+
- [AWS Secrets Manager](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#aws-secrets-manager)
178+
- [Global Configurations](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#global-configurations)
179+
- [Distributed - Ray](https://aws-sdk-pandas.readthedocs.io/en/3.11.0/api.html#distributed-ray)
182180
- [**License**](https://github.com/aws/aws-sdk-pandas/blob/main/LICENSE.txt)
183181
- [**Contributing**](https://github.com/aws/aws-sdk-pandas/blob/main/CONTRIBUTING.md)
184182

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.10.1
1+
3.11.0

awswrangler/__metadata__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,5 +7,5 @@
77

88
__title__: str = "awswrangler"
99
__description__: str = "Pandas on AWS."
10-
__version__: str = "3.10.1"
10+
__version__: str = "3.11.0"
1111
__license__: str = "Apache License 2.0"

awswrangler/athena/_read.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -793,11 +793,11 @@ def read_sql_query(
793793
794794
**Related tutorial:**
795795
796-
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
796+
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
797797
tutorials/006%20-%20Amazon%20Athena.html>`_
798-
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
798+
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
799799
tutorials/019%20-%20Athena%20Cache.html>`_
800-
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
800+
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
801801
tutorials/021%20-%20Global%20Configurations.html>`_
802802
803803
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -861,7 +861,7 @@ def read_sql_query(
861861
/athena.html#Athena.Client.get_query_execution>`_ .
862862
863863
For a practical example check out the
864-
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
864+
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
865865
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
866866
867867
@@ -1140,11 +1140,11 @@ def read_sql_table(
11401140
11411141
**Related tutorial:**
11421142
1143-
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
1143+
- `Amazon Athena <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
11441144
tutorials/006%20-%20Amazon%20Athena.html>`_
1145-
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
1145+
- `Athena Cache <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
11461146
tutorials/019%20-%20Athena%20Cache.html>`_
1147-
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
1147+
- `Global Configurations <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
11481148
tutorials/021%20-%20Global%20Configurations.html>`_
11491149
11501150
**There are three approaches available through ctas_approach and unload_approach parameters:**
@@ -1208,7 +1208,7 @@ def read_sql_table(
12081208
/athena.html#Athena.Client.get_query_execution>`_ .
12091209
12101210
For a practical example check out the
1211-
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.10.1/
1211+
`related tutorial <https://aws-sdk-pandas.readthedocs.io/en/3.11.0/
12121212
tutorials/024%20-%20Athena%20Query%20Metadata.html>`_!
12131213
12141214

awswrangler/athena/_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -457,8 +457,8 @@ def create_athena_bucket(boto3_session: boto3.Session | None = None) -> str:
457457
args = {} if region_name == "us-east-1" else {"CreateBucketConfiguration": {"LocationConstraint": region_name}}
458458
try:
459459
client_s3.create_bucket(Bucket=bucket_name, **args) # type: ignore[arg-type]
460-
except (client_s3.exceptions.BucketAlreadyExists, client_s3.exceptions.BucketAlreadyOwnedByYou) as err:
461-
_logger.debug("Bucket %s already exists.", err.response["Error"]["BucketName"])
460+
except (client_s3.exceptions.BucketAlreadyExists, client_s3.exceptions.BucketAlreadyOwnedByYou):
461+
_logger.debug("Bucket %s already exists.", bucket_name)
462462
except botocore.exceptions.ClientError as err:
463463
if err.response["Error"]["Code"] == "OperationAborted":
464464
_logger.debug("A conflicting conditional operation is currently in progress against this resource.")

awswrangler/catalog/_create.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1100,7 +1100,7 @@ def create_csv_table(
11001100
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
11011101
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
11021102
Related tutorial:
1103-
https://aws-sdk-pandas.readthedocs.io/en/3.10.1/tutorials/014%20-%20Schema%20Evolution.html
1103+
https://aws-sdk-pandas.readthedocs.io/en/3.11.0/tutorials/014%20-%20Schema%20Evolution.html
11041104
sep
11051105
String of length 1. Field delimiter for the output file.
11061106
skip_header_line_count
@@ -1280,7 +1280,7 @@ def create_json_table(
12801280
If True allows schema evolution (new or missing columns), otherwise a exception will be raised.
12811281
(Only considered if dataset=True and mode in ("append", "overwrite_partitions"))
12821282
Related tutorial:
1283-
https://aws-sdk-pandas.readthedocs.io/en/3.10.1/tutorials/014%20-%20Schema%20Evolution.html
1283+
https://aws-sdk-pandas.readthedocs.io/en/3.11.0/tutorials/014%20-%20Schema%20Evolution.html
12841284
serde_library
12851285
Specifies the SerDe Serialization library which will be used. You need to provide the Class library name
12861286
as a string.

awswrangler/opensearch/_read.py

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,24 @@ def _hit_to_row(hit: Mapping[str, Any]) -> Mapping[str, Any]:
4141
return row
4242

4343

44-
def _search_response_to_documents(response: Mapping[str, Any]) -> list[Mapping[str, Any]]:
45-
return [_hit_to_row(hit) for hit in response.get("hits", {}).get("hits", [])]
46-
47-
48-
def _search_response_to_df(response: Mapping[str, Any] | Any) -> pd.DataFrame:
49-
return pd.DataFrame(_search_response_to_documents(response))
44+
def _search_response_to_documents(
45+
response: Mapping[str, Any], aggregations: list[str] | None = None
46+
) -> list[Mapping[str, Any]]:
47+
hits = response.get("hits", {}).get("hits", [])
48+
if not hits and aggregations:
49+
hits = [
50+
dict(aggregation_hit, _aggregation_name=aggregation_name)
51+
for aggregation_name in aggregations
52+
for aggregation_hit in response.get("aggregations", {})
53+
.get(aggregation_name, {})
54+
.get("hits", {})
55+
.get("hits", [])
56+
]
57+
return [_hit_to_row(hit) for hit in hits]
58+
59+
60+
def _search_response_to_df(response: Mapping[str, Any] | Any, aggregations: list[str] | None = None) -> pd.DataFrame:
61+
return pd.DataFrame(_search_response_to_documents(response=response, aggregations=aggregations))
5062

5163

5264
@_utils.check_optional_dependency(opensearchpy, "opensearchpy")
@@ -128,8 +140,16 @@ def search(
128140
documents = [_hit_to_row(doc) for doc in documents_generator]
129141
df = pd.DataFrame(documents)
130142
else:
143+
aggregations = (
144+
list(search_body.get("aggregations", {}).keys() or search_body.get("aggs", {}).keys())
145+
if search_body
146+
else None
147+
)
131148
response = client.search(index=index, body=search_body, filter_path=filter_path, **kwargs)
132-
df = _search_response_to_df(response)
149+
df = _search_response_to_df(
150+
response=response,
151+
aggregations=aggregations,
152+
)
133153
return df
134154

135155

0 commit comments

Comments
 (0)