Skip to content

Commit 1890293

Browse files
committed
Add Athena cache tutorial.
1 parent 462d689 commit 1890293

File tree

3 files changed

+1035
-34
lines changed

3 files changed

+1035
-34
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ wr.db.to_sql(df, engine, schema="test", name="my_table")
9797
- [016 - EMR & Docker](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/016%20-%20EMR%20%26%20Docker.ipynb)
9898
- [017 - Partition Projection](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/017%20-%20Partition%20Projection.ipynb)
9999
- [018 - QuickSight](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/018%20-%20QuickSight.ipynb)
100+
- [019 - Athena Cache](https://github.com/awslabs/aws-data-wrangler/blob/master/tutorials/019%20-%20athena%20cache.ipynb)
101+
100102
- [**API Reference**](https://aws-data-wrangler.readthedocs.io/en/latest/api.html)
101103
- [Amazon S3](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#amazon-s3)
102104
- [AWS Glue Catalog](https://aws-data-wrangler.readthedocs.io/en/latest/api.html#aws-glue-catalog)

tutorials/006 - Amazon Athena.ipynb

Lines changed: 10 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -66,27 +66,7 @@
6666
"cell_type": "markdown",
6767
"metadata": {},
6868
"source": [
69-
"## Checking Glue Catalog Databases"
70-
]
71-
},
72-
{
73-
"cell_type": "code",
74-
"execution_count": 3,
75-
"metadata": {},
76-
"outputs": [
77-
{
78-
"name": "stdout",
79-
"output_type": "stream",
80-
"text": [
81-
" Database Description\n",
82-
"0 aws_data_wrangler AWS Data Wrangler Test Arena - Glue Database\n",
83-
"1 default Default Hive database\n"
84-
]
85-
}
86-
],
87-
"source": [
88-
"databases = wr.catalog.databases()\n",
89-
"print(databases)"
69+
"## Checking/Creating Glue Catalog Databases"
9070
]
9171
},
9272
{
@@ -106,11 +86,8 @@
10686
}
10787
],
10888
"source": [
109-
"if \"awswrangler_test\" not in databases.values:\n",
110-
" wr.catalog.create_database(\"awswrangler_test\")\n",
111-
" print(wr.catalog.databases())\n",
112-
"else:\n",
113-
" print(\"Database awswrangler_test already exists\")"
89+
"if \"awswrangler_test\" not in wr.catalog.databases().values:\n",
90+
" wr.catalog.create_database(\"awswrangler_test\")"
11491
]
11592
},
11693
{
@@ -324,14 +301,14 @@
324301
"metadata": {},
325302
"outputs": [],
326303
"source": [
327-
"res = wr.s3.to_parquet(\n",
304+
"wr.s3.to_parquet(\n",
328305
" df=df,\n",
329306
" path=path,\n",
330307
" dataset=True,\n",
331308
" mode=\"overwrite\",\n",
332309
" database=\"awswrangler_test\",\n",
333310
" table=\"noaa\"\n",
334-
")"
311+
");"
335312
]
336313
},
337314
{
@@ -1120,7 +1097,7 @@
11201097
" \"SELECT * FROM noaa\",\n",
11211098
" database=\"awswrangler_test\",\n",
11221099
" ctas_approach=False,\n",
1123-
" chunksize=10_000_000\n",
1100+
" chunksize=500_000\n",
11241101
")\n",
11251102
"\n",
11261103
"for df in dfs: # Batching\n",
@@ -1147,7 +1124,7 @@
11471124
"cell_type": "markdown",
11481125
"metadata": {},
11491126
"source": [
1150-
"## Cleaning Up the Database"
1127+
"## Delete table"
11511128
]
11521129
},
11531130
{
@@ -1156,15 +1133,14 @@
11561133
"metadata": {},
11571134
"outputs": [],
11581135
"source": [
1159-
"for table in wr.catalog.get_tables(database=\"awswrangler_test\"):\n",
1160-
" wr.catalog.delete_table_if_exists(database=\"awswrangler_test\", table=table[\"Name\"])"
1136+
"wr.catalog.delete_table_if_exists(database=\"awswrangler_test\", table=\"noaa\")"
11611137
]
11621138
},
11631139
{
11641140
"cell_type": "markdown",
11651141
"metadata": {},
11661142
"source": [
1167-
"### Delete Database"
1143+
"## Delete Database"
11681144
]
11691145
},
11701146
{
@@ -1193,7 +1169,7 @@
11931169
"name": "python",
11941170
"nbconvert_exporter": "python",
11951171
"pygments_lexer": "ipython3",
1196-
"version": "3.7.7"
1172+
"version": "3.6.10"
11971173
},
11981174
"pycharm": {
11991175
"stem_cell": {

0 commit comments

Comments
 (0)