Skip to content

Commit 67e5f50

Browse files
cnfaitkukushking
andauthored
correct a few typos in our ipynb tutorials (#1694)
Co-authored-by: kukushking <[email protected]>
1 parent 3e49124 commit 67e5f50

14 files changed

+27
-48
lines changed

tutorials/006 - Amazon Athena.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@
143143
" mode=\"overwrite\",\n",
144144
" database=\"awswrangler_test\",\n",
145145
" table=\"noaa\"\n",
146-
");"
146+
")"
147147
]
148148
},
149149
{

tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server, Oracle.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"# 7 - Redshift, MySQL, PostgreSQL, SQL Server and Oracle\n",
1010
"\n",
11-
"[awswrangler](https://github.com/aws/aws-sdk-pandas)'s Redshift, MySQL and PostgreSQL have two basic function in common that tries to follow the Pandas conventions, but add more data type consistency.\n",
11+
"[awswrangler](https://github.com/aws/aws-sdk-pandas)'s Redshift, MySQL and PostgreSQL have two basic functions in common that try to follow Pandas conventions, but add more data type consistency.\n",
1212
"\n",
1313
"- [wr.redshift.to_sql()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.redshift.to_sql.html)\n",
1414
"- [wr.redshift.read_sql_query()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.redshift.read_sql_query.html)\n",

tutorials/014 - Schema Evolution.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"\n",
99
"# 14 - Schema Evolution\n",
1010
"\n",
11-
"awswrangler support new **columns** on Parquet and CSV datasets through:\n",
11+
"awswrangler supports new **columns** on Parquet and CSV datasets through:\n",
1212
"\n",
1313
"- [wr.s3.to_parquet()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet)\n",
1414
"- [wr.s3.store_parquet_metadata()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.s3.store_parquet_metadata.html#awswrangler.s3.store_parquet_metadata) i.e. \"Crawler\"\n",

tutorials/015 - EMR.ipynb

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -160,13 +160,6 @@
160160
"source": [
161161
"wr.emr.terminate_cluster(cluster_id)"
162162
]
163-
},
164-
{
165-
"cell_type": "code",
166-
"execution_count": null,
167-
"metadata": {},
168-
"outputs": [],
169-
"source": []
170163
}
171164
],
172165
"metadata": {

tutorials/016 - EMR & Docker.ipynb

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@
201201
"print(f\"awswrangler version: {wr.__version__}\")\n",
202202
"\"\"\"\n",
203203
"\n",
204-
"boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\");"
204+
"boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\")"
205205
]
206206
},
207207
{
@@ -329,13 +329,6 @@
329329
"\n",
330330
"wr.emr.terminate_cluster(cluster_id)"
331331
]
332-
},
333-
{
334-
"cell_type": "code",
335-
"execution_count": null,
336-
"metadata": {},
337-
"outputs": [],
338-
"source": []
339332
}
340333
],
341334
"metadata": {

tutorials/017 - Partition Projection.ipynb

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@
159159
" \"month\": \"1,12\",\n",
160160
" \"day\": \"1,31\"\n",
161161
" },\n",
162-
");"
162+
")"
163163
]
164164
},
165165
{
@@ -334,7 +334,7 @@
334334
" projection_values={\n",
335335
" \"city\": \"São Paulo,Tokio,Seattle\"\n",
336336
" },\n",
337-
");"
337+
")"
338338
]
339339
},
340340
{
@@ -511,7 +511,7 @@
511511
" \"dt\": \"2020-01-01,2020-01-03\",\n",
512512
" \"ts\": \"2020-01-01 00:00:00,2020-01-01 00:00:02\"\n",
513513
" },\n",
514-
");"
514+
")"
515515
]
516516
},
517517
{
@@ -679,7 +679,7 @@
679679
" projection_types={\n",
680680
" \"uuid\": \"injected\",\n",
681681
" }\n",
682-
");"
682+
")"
683683
]
684684
},
685685
{

tutorials/018 - QuickSight.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
"* [Exploring the public AWS COVID-19 data lake](https://aws.amazon.com/blogs/big-data/exploring-the-public-aws-covid-19-data-lake/)\n",
1717
"* [CloudFormation template](https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json)\n",
1818
"\n",
19-
"*Please, install the Cloudformation template above to have access to the public data lake.*\n",
19+
"*Please, install the CloudFormation template above to have access to the public data lake.*\n",
2020
"\n",
2121
"*P.S. To be able to access the public data lake, you must allow explicitly QuickSight to access the related external bucket.*"
2222
]

tutorials/019 - Athena Cache.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@
88
"\n",
99
"# 19 - Amazon Athena Cache\n",
1010
"\n",
11-
"[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled passing `max_cache_seconds` biggier than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n",
11+
"[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled by passing `max_cache_seconds` bigger than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n",
1212
"\n",
1313
"When calling `read_sql_query`, instead of just running the query, we now can verify if the query has been run before. If so, and this last run was within `max_cache_seconds` (a new parameter to `read_sql_query`), we return the same results as last time if they are still available in S3. We have seen this increase performance more than 100x, but the potential is pretty much infinite.\n",
1414
"\n",
1515
"The detailed approach is:\n",
1616
"- When `read_sql_query` is called with `max_cache_seconds > 0` (it defaults to 0), we check for the last queries run by the same workgroup (the most we can get without pagination).\n",
17-
"- By default it will check the last 50 queries, but you can customize it throught the `max_cache_query_inspections` argument.\n",
17+
"- By default it will check the last 50 queries, but you can customize it through the `max_cache_query_inspections` argument.\n",
1818
"- We then sort those queries based on CompletionDateTime, descending\n",
1919
"- For each of those queries, we check if their CompletionDateTime is still within the `max_cache_seconds` window. If so, we check if the query string is the same as now (with some smart heuristics to guarantee coverage over both `ctas_approach`es). If they are the same, we check if the last one's results are still on S3, and then return them instead of re-running the query.\n",
2020
"- During the whole cache resolution phase, if there is anything wrong, the logic falls back to the usual `read_sql_query` path.\n",
@@ -292,7 +292,7 @@
292292
" mode=\"overwrite\",\n",
293293
" database=\"awswrangler_test\",\n",
294294
" table=\"noaa\"\n",
295-
");"
295+
")"
296296
]
297297
},
298298
{

tutorials/020 - Spark Table Interoperability.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,9 @@
88
"\n",
99
"# 20 - Spark Table Interoperability\n",
1010
"\n",
11-
"[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficults to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",
11+
"[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficulty to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",
1212
"\n",
13-
"But if you want to do the oposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
13+
"But if you want to do the opposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
1414
]
1515
},
1616
{

tutorials/022 - Writing Partitions Concurrently.ipynb

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
"* `concurrent_partitioning` argument:\n",
1212
"\n",
1313
" If True will increase the parallelism level during the partitions writing. It will decrease the\n",
14-
" writing time and increase the memory usage.\n",
14+
" writing time and increase memory usage.\n",
1515
"\n",
1616
"*P.S. Check the [function API doc](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html) to see it has some argument that can be configured through Global configurations.*"
1717
]
@@ -121,7 +121,7 @@
121121
" dataset=True,\n",
122122
" mode=\"overwrite\",\n",
123123
" partition_cols=[\"year\"],\n",
124-
");"
124+
")"
125125
]
126126
},
127127
{
@@ -157,15 +157,8 @@
157157
" mode=\"overwrite\",\n",
158158
" partition_cols=[\"year\"],\n",
159159
" concurrent_partitioning=True # <-----\n",
160-
");"
160+
")"
161161
]
162-
},
163-
{
164-
"cell_type": "code",
165-
"execution_count": null,
166-
"metadata": {},
167-
"outputs": [],
168-
"source": []
169162
}
170163
],
171164
"metadata": {

0 commit comments

Comments
 (0)