Skip to content

Commit 24d03bd

Browse files
committed
Improving Athena module documentation.
1 parent 4c6237c commit 24d03bd

File tree

1 file changed

+37
-17
lines changed

1 file changed

+37
-17
lines changed

awswrangler/athena/_read.py

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -522,9 +522,15 @@ def read_sql_query(
522522
) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:
523523
"""Execute any SQL query on AWS Athena and return the results as a Pandas DataFrame.
524524
525-
There are two approaches to be defined through ctas_approach parameter:
525+
**Related tutorial:**
526526
527-
**1 - ctas_approach=True (Default):**
527+
- `Amazon Athena <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/006%20-%20Amazon%20Athena.ipynb>`_
528+
- `Athena Cache <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/019%20-%20Athena%20Cache.ipynb>`_
529+
- `Global Configurations <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/021%20-%20Global%20Configurations.ipynb>`_
530+
531+
**There are two approaches to be defined through ctas_approach parameter:**
532+
533+
**1** - ctas_approach=True (Default):
528534
529535
Wrap the query with a CTAS and then reads the table data as parquet directly from s3.
530536
@@ -541,7 +547,7 @@ def read_sql_query(
541547
- Does not support columns with undefined data types.
542548
- A temporary table will be created and then deleted immediately.
543549
544-
**2 - ctas_approach=False:**
550+
**2** - ctas_approach=False:
545551
546552
Does a regular query on Athena and parse the regular CSV result on s3.
547553
@@ -560,9 +566,12 @@ def read_sql_query(
560566
Note
561567
----
562568
The resulting DataFrame (or every DataFrame in the returned Iterator for chunked queries) have a
563-
`query_metadata` attribute, which brings the query result metadata returned by Boto3/Athena.
564-
The expected `query_metadata` format is the same as returned by:
565-
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_execution
569+
`query_metadata` attribute, which brings the query result metadata returned by
570+
`Boto3/Athena <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_execution>`_ .
571+
572+
For a pratical example check out the
573+
`related tutorial <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/024%20-%20Athena%20Query%20Metadata.ipynb>`_!
574+
566575
567576
Note
568577
----
@@ -578,9 +587,9 @@ def read_sql_query(
578587
579588
Note
580589
----
581-
``Batching`` (`chunksize` argument) (Memory Friendly):
590+
`chunksize` argument (Memory Friendly) (i.e batching):
582591
583-
Will anable the function to return a Iterable of DataFrames instead of a regular DataFrame.
592+
Enable the function to return an Iterable of DataFrames instead of a regular DataFrame.
584593
585594
There are two batching strategies on Wrangler:
586595
@@ -601,7 +610,9 @@ def read_sql_query(
601610
sql : str
602611
SQL query.
603612
database : str
604-
AWS Glue/Athena database name.
613+
AWS Glue/Athena database name - It is only the origin database from where the query will be launched,
614+
You can still using and mixing several databases writing the full table name within the sql
615+
(e.g. `database.table`).
605616
ctas_approach: bool
606617
Wraps the query using a CTAS, and read the resulted parquet data on S3.
607618
If false, read the regular CSV on S3.
@@ -715,9 +726,15 @@ def read_sql_table(
715726
) -> Union[pd.DataFrame, Iterator[pd.DataFrame]]:
716727
"""Extract the full table AWS Athena and return the results as a Pandas DataFrame.
717728
718-
There are two approaches to be defined through ctas_approach parameter:
729+
**Related tutorial:**
719730
720-
**1 - ctas_approach=True (Default):**
731+
- `Amazon Athena <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/006%20-%20Amazon%20Athena.ipynb>`_
732+
- `Athena Cache <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/019%20-%20Athena%20Cache.ipynb>`_
733+
- `Global Configurations <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/021%20-%20Global%20Configurations.ipynb>`_
734+
735+
**There are two approaches to be defined through ctas_approach parameter:**
736+
737+
**1** - ctas_approach=True (Default):
721738
722739
Wrap the query with a CTAS and then reads the table data as parquet directly from s3.
723740
@@ -734,7 +751,7 @@ def read_sql_table(
734751
- Does not support columns with undefined data types.
735752
- A temporary table will be created and then deleted immediately.
736753
737-
**2 - ctas_approach=False:**
754+
**2** - ctas_approach=False:
738755
739756
Does a regular query on Athena and parse the regular CSV result on s3.
740757
@@ -752,9 +769,12 @@ def read_sql_table(
752769
Note
753770
----
754771
The resulting DataFrame (or every DataFrame in the returned Iterator for chunked queries) have a
755-
`query_metadata` attribute, which brings the query result metadata returned by Boto3/Athena.
756-
The expected `query_metadata` format is the same as returned by:
757-
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_execution
772+
`query_metadata` attribute, which brings the query result metadata returned by
773+
`Boto3/Athena <https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/athena.html#Athena.Client.get_query_execution>`_ .
774+
775+
For a pratical example check out the
776+
`related tutorial <https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/024%20-%20Athena%20Query%20Metadata.ipynb>`_!
777+
758778
759779
Note
760780
----
@@ -770,9 +790,9 @@ def read_sql_table(
770790
771791
Note
772792
----
773-
``Batching`` (`chunksize` argument) (Memory Friendly):
793+
`chunksize` argument (Memory Friendly) (i.e batching):
774794
775-
Will anable the function to return a Iterable of DataFrames instead of a regular DataFrame.
795+
Enable the function to return an Iterable of DataFrames instead of a regular DataFrame.
776796
777797
There are two batching strategies on Wrangler:
778798

0 commit comments

Comments
 (0)