correct a few typos in our ipynb tutorials (#1694)

cnfait · kukushking · web-flow · commit 67e5f5089cbc · 2022-10-19T08:02:37.000-07:00
Co-authored-by: kukushking &lt;3997468+kukushking@users.noreply.github.com&gt;
diff --git a/tutorials/006 - Amazon Athena.ipynb b/tutorials/006 - Amazon Athena.ipynb
@@ -143,7 +143,7 @@
     "    mode=\"overwrite\",\n",
     "    database=\"awswrangler_test\",\n",
     "    table=\"noaa\"\n",
-    ");"
+    ")"
    ]
   },
   {
diff --git a/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server, Oracle.ipynb b/tutorials/007 - Redshift, MySQL, PostgreSQL, SQL Server, Oracle.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "# 7 - Redshift, MySQL, PostgreSQL, SQL Server and Oracle\n",
     "\n",
-    "[awswrangler](https://github.com/aws/aws-sdk-pandas)'s Redshift, MySQL and PostgreSQL have two basic function in common that tries to follow the Pandas conventions, but add more data type consistency.\n",
+    "[awswrangler](https://github.com/aws/aws-sdk-pandas)'s Redshift, MySQL and PostgreSQL have two basic functions in common that try to follow Pandas conventions, but add more data type consistency.\n",
     "\n",
     "- [wr.redshift.to_sql()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.redshift.to_sql.html)\n",
     "- [wr.redshift.read_sql_query()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.redshift.read_sql_query.html)\n",
diff --git a/tutorials/014 - Schema Evolution.ipynb b/tutorials/014 - Schema Evolution.ipynb
@@ -8,7 +8,7 @@
     "\n",
     "# 14 - Schema Evolution\n",
     "\n",
-    "awswrangler support new **columns** on Parquet and CSV datasets through:\n",
+    "awswrangler supports new **columns** on Parquet and CSV datasets through:\n",
     "\n",
     "- [wr.s3.to_parquet()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.s3.to_parquet.html#awswrangler.s3.to_parquet)\n",
     "- [wr.s3.store_parquet_metadata()](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/stubs/awswrangler.s3.store_parquet_metadata.html#awswrangler.s3.store_parquet_metadata) i.e. \"Crawler\"\n",
diff --git a/tutorials/015 - EMR.ipynb b/tutorials/015 - EMR.ipynb
@@ -160,13 +160,6 @@
    "source": [
     "wr.emr.terminate_cluster(cluster_id)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/tutorials/016 - EMR & Docker.ipynb b/tutorials/016 - EMR & Docker.ipynb
@@ -201,7 +201,7 @@
     "print(f\"awswrangler version: {wr.__version__}\")\n",
     "\"\"\"\n",
     "\n",
-    "boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\");"
+    "boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\")"
    ]
   },
   {
@@ -329,13 +329,6 @@
     "\n",
     "wr.emr.terminate_cluster(cluster_id)"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/tutorials/017 - Partition Projection.ipynb b/tutorials/017 - Partition Projection.ipynb
@@ -159,7 +159,7 @@
     "        \"month\": \"1,12\",\n",
     "        \"day\": \"1,31\"\n",
     "    },\n",
-    ");"
+    ")"
    ]
   },
   {
@@ -334,7 +334,7 @@
     "    projection_values={\n",
     "        \"city\": \"São Paulo,Tokio,Seattle\"\n",
     "    },\n",
-    ");"
+    ")"
    ]
   },
   {
@@ -511,7 +511,7 @@
     "        \"dt\": \"2020-01-01,2020-01-03\",\n",
     "        \"ts\": \"2020-01-01 00:00:00,2020-01-01 00:00:02\"\n",
     "    },\n",
-    ");"
+    ")"
    ]
   },
   {
@@ -679,7 +679,7 @@
     "    projection_types={\n",
     "        \"uuid\": \"injected\",\n",
     "    }\n",
-    ");"
+    ")"
    ]
   },
   {
diff --git a/tutorials/018 - QuickSight.ipynb b/tutorials/018 - QuickSight.ipynb
@@ -16,7 +16,7 @@
     "* [Exploring the public AWS COVID-19 data lake](https://aws.amazon.com/blogs/big-data/exploring-the-public-aws-covid-19-data-lake/)\n",
     "* [CloudFormation template](https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json)\n",
     "\n",
-    "*Please, install the Cloudformation template above to have access to the public data lake.*\n",
+    "*Please, install the CloudFormation template above to have access to the public data lake.*\n",
     "\n",
     "*P.S. To be able to access the public data lake, you must allow explicitly QuickSight to access the related external bucket.*"
    ]
diff --git a/tutorials/019 - Athena Cache.ipynb b/tutorials/019 - Athena Cache.ipynb
@@ -8,13 +8,13 @@
     "\n",
     "# 19 - Amazon Athena Cache\n",
     "\n",
-    "[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled passing `max_cache_seconds` biggier than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n",
+    "[awswrangler](https://github.com/aws/aws-sdk-pandas) has a cache strategy that is disabled by default and can be enabled by passing `max_cache_seconds` bigger than 0. This cache strategy for Amazon Athena can help you to **decrease query times and costs**.\n",
     "\n",
     "When calling `read_sql_query`, instead of just running the query, we now can verify if the query has been run before. If so, and this last run was within `max_cache_seconds` (a new parameter to `read_sql_query`), we return the same results as last time if they are still available in S3. We have seen this increase performance more than 100x, but the potential is pretty much infinite.\n",
     "\n",
     "The detailed approach is:\n",
     "- When `read_sql_query` is called with `max_cache_seconds > 0` (it defaults to 0), we check for the last queries run by the same workgroup (the most we can get without pagination).\n",
-    "- By default it will check the last 50 queries, but you can customize it throught the `max_cache_query_inspections` argument.\n",
+    "- By default it will check the last 50 queries, but you can customize it through the `max_cache_query_inspections` argument.\n",
     "- We then sort those queries based on CompletionDateTime, descending\n",
     "- For each of those queries, we check if their CompletionDateTime is still within the `max_cache_seconds` window. If so, we check if the query string is the same as now (with some smart heuristics to guarantee coverage over both `ctas_approach`es). If they are the same, we check if the last one's results are still on S3, and then return them instead of re-running the query.\n",
     "- During the whole cache resolution phase, if there is anything wrong, the logic falls back to the usual `read_sql_query` path.\n",
@@ -292,7 +292,7 @@
     "    mode=\"overwrite\",\n",
     "    database=\"awswrangler_test\",\n",
     "    table=\"noaa\"\n",
-    ");"
+    ")"
    ]
   },
   {
diff --git a/tutorials/020 - Spark Table Interoperability.ipynb b/tutorials/020 - Spark Table Interoperability.ipynb
@@ -8,9 +8,9 @@
     "\n",
     "# 20 - Spark Table Interoperability\n",
     "\n",
-    "[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficults to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",
+    "[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficulty to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",
     "\n",
-    "But if you want to do the oposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
+    "But if you want to do the opposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
    ]
   },
   {
diff --git a/tutorials/022 - Writing Partitions Concurrently.ipynb b/tutorials/022 - Writing Partitions Concurrently.ipynb
@@ -11,7 +11,7 @@
     "* `concurrent_partitioning` argument:\n",
     "\n",
     "        If True will increase the parallelism level during the partitions writing. It will decrease the\n",
-    "        writing time and increase the memory usage.\n",
+    "        writing time and increase memory usage.\n",
     "\n",
     "*P.S. Check the [function API doc](https://aws-sdk-pandas.readthedocs.io/en/2.17.0/api.html) to see it has some argument that can be configured through Global configurations.*"
    ]
@@ -121,7 +121,7 @@
     "    dataset=True,\n",
     "    mode=\"overwrite\",\n",
     "    partition_cols=[\"year\"],\n",
-    ");"
+    ")"
    ]
   },
   {
@@ -157,15 +157,8 @@
     "    mode=\"overwrite\",\n",
     "    partition_cols=[\"year\"],\n",
     "    concurrent_partitioning=True  # <-----\n",
-    ");"
+    ")"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
diff --git a/tutorials/025 - Redshift - Loading Parquet files with Spectrum.ipynb b/tutorials/025 - Redshift - Loading Parquet files with Spectrum.ipynb
@@ -164,7 +164,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "wr.s3.to_parquet(df, PATH, max_rows_by_file=2, dataset=True, mode=\"overwrite\");"
+    "wr.s3.to_parquet(df, PATH, max_rows_by_file=2, dataset=True, mode=\"overwrite\")"
    ]
   },
   {
@@ -252,7 +252,7 @@
     "    \"col0\": [10, 11],\n",
     "    \"col1\": [\"k\", \"l\"],\n",
     "})\n",
-    "wr.s3.to_parquet(df, PATH, dataset=True, mode=\"overwrite\");"
+    "wr.s3.to_parquet(df, PATH, dataset=True, mode=\"overwrite\")"
    ]
   },
   {
diff --git a/tutorials/026 - Amazon Timestream.ipynb b/tutorials/026 - Amazon Timestream.ipynb
@@ -27,7 +27,7 @@
     "from datetime import datetime\n",
     "\n",
     "wr.timestream.create_database(\"sampleDB\")\n",
-    "wr.timestream.create_table(\"sampleDB\", \"sampleTable\", memory_retention_hours=1, magnetic_retention_days=1);"
+    "wr.timestream.create_table(\"sampleDB\", \"sampleTable\", memory_retention_hours=1, magnetic_retention_days=1)"
    ]
   },
   {
diff --git a/tutorials/027 - Amazon Timestream 2.ipynb b/tutorials/027 - Amazon Timestream 2.ipynb
@@ -101,7 +101,7 @@
    "outputs": [],
    "source": [
     "wr.timestream.create_database(\"sampleDB\")\n",
-    "wr.timestream.create_table(\"sampleDB\", \"sampleTable\", memory_retention_hours=1, magnetic_retention_days=1);"
+    "wr.timestream.create_table(\"sampleDB\", \"sampleTable\", memory_retention_hours=1, magnetic_retention_days=1)"
    ]
   },
   {
diff --git a/tutorials/033 - Amazon Neptune.ipynb b/tutorials/033 - Amazon Neptune.ipynb
@@ -48,7 +48,7 @@
     "\n",
     "The first step to using AWS SDK for pandas with Amazon Neptune is to import the library and create a client connection.\n",
     "\n",
-    "<div style=\"background-color:#eeeeee; padding:10px; text-align:left; border-radius:10px; margin-top:10px; margin-bottom:10px; \"><b>Note</b>: Connecting to Amazon Neptune requires that the application you are running has access to the Private VPC where Neptune is located.  Without this access you will not be able to connect using AWS SDK for pandas.</div>"
+    "<div style=\"background-color:#eeeeee; padding:10px; text-align:left; border-radius:10px; margin-top:10px; margin-bottom:10px; \"><b>Note</b>: Connecting to Amazon Neptune requires that the application you are running has access to the Private VPC where Neptune is located. Without this access you will not be able to connect using AWS SDK for pandas.</div>"
    ]
   },
   {
@@ -159,7 +159,7 @@
    "source": [
     "## Saving Data using AWS SDK for pandas\n",
     "\n",
-    "AWS SDK for pandas supports saving Pandas DataFrames into Amazon Neptune using either a property graph or RDF data model.  \n",
+    "AWS SDK for pandas supports saving Pandas DataFrames into Amazon Neptune using either a property graph or RDF data model.\n",
     "\n",
     "### Property Graph\n",
     "\n",
@@ -169,7 +169,7 @@
     "\n",
     "If no `~label` column exists then writing to the graph will be treated as an update of the element with the specified `~id` value.\n",
     "\n",
-    "DataFrames for edges must have a `~id`, `~label`, `~to`, and `~from` column.  If the `~id` column does not exist the specified id does not exists, or is empty then a new edge will be added. If no `~label`, `~to`, or `~from` column exists an exception will be thrown.\n",
+    "DataFrames for edges must have a `~id`, `~label`, `~to`, and `~from` column. If the `~id` column does not exist the specified id does not exists, or is empty then a new edge will be added. If no `~label`, `~to`, or `~from` column exists an exception will be thrown.\n",
     "\n",
     "#### Add Vertices/Nodes"
    ]
@@ -274,7 +274,7 @@
     "#### Setting cardinality based on the header\n",
     "\n",
     " If you would like to save data using `single` cardinality then you can postfix (single) to the column header and\n",
-    "    set `use_header_cardinality=True` (default).  e.g. A column named `name(single)` will save the `name` property as single cardinality.  You can disable this by setting by setting `use_header_cardinality=False`."
+    "    set `use_header_cardinality=True` (default). e.g. A column named `name(single)` will save the `name` property as single cardinality. You can disable this by setting `use_header_cardinality=False`."
    ]
   },
   {
@@ -303,7 +303,7 @@
    "source": [
     "### RDF\n",
     "\n",
-    "The DataFrame must consist of triples with column names for the subject, predicate, and object specified.  If none are provided than `s`, `p`, and `o` are the default.\n",
+    "The DataFrame must consist of triples with column names for the subject, predicate, and object specified. If none are provided then `s`, `p`, and `o` are the default.\n",
     "\n",
     "If you want to add data into a named graph then you will also need the graph column, default is `g`.\n",
     "\n",
@@ -371,7 +371,7 @@
    "source": [
     "## Flatten DataFrames\n",
     "\n",
-    "One of the complexities of working with a row/columns paradigm, such as Pandas, with graph results set is that it is very common for graph results to return complex and nested objects.  To help simplify using the results returned from a graph within a more tabular format we have added a method to flatten the returned Pandas DataFrame.\n",
+    "One of the complexities of working with a row/columns paradigm, such as Pandas, with graph results set is that it is very common for graph results to return complex and nested objects. To help simplify using the results returned from a graph within a more tabular format we have added a method to flatten the returned Pandas DataFrame.\n",
     "\n",
     "### Flattening the DataFrame"
    ]

Original file line number	Diff line number	Diff line change
`@@ -143,7 +143,7 @@`
`143`	`143`	`" mode=\"overwrite\",\n",`
`144`	`144`	`" database=\"awswrangler_test\",\n",`
`145`	`145`	`" table=\"noaa\"\n",`
`146`		`- ");"`
	`146`	`+ ")"`
`147`	`147`	`]`
`148`	`148`	`},`
`149`	`149`	`{`
Original file line number	Diff line number	Diff line change
`@@ -160,13 +160,6 @@`
`160`	`160`	`"source": [`
`161`	`161`	`"wr.emr.terminate_cluster(cluster_id)"`
`162`	`162`	`]`
`163`		`- },`
`164`		`- {`
`165`		`- "cell_type": "code",`
`166`		`- "execution_count": null,`
`167`		`- "metadata": {},`
`168`		`- "outputs": [],`
`169`		`- "source": []`
`170`	`163`	`}`
`171`	`164`	`],`
`172`	`165`	`"metadata": {`
Original file line number	Diff line number	Diff line change
`@@ -201,7 +201,7 @@`
`201`	`201`	`"print(f\"awswrangler version: {wr.__version__}\")\n",`
`202`	`202`	`"\"\"\"\n",`
`203`	`203`	`"\n",`
`204`		`- "boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\");"`
	`204`	`+ "boto3.client(\"s3\").put_object(Body=script, Bucket=bucket, Key=\"test_docker.py\")"`
`205`	`205`	`]`
`206`	`206`	`},`
`207`	`207`	`{`
`@@ -329,13 +329,6 @@`
`329`	`329`	`"\n",`
`330`	`330`	`"wr.emr.terminate_cluster(cluster_id)"`
`331`	`331`	`]`
`332`		`- },`
`333`		`- {`
`334`		`- "cell_type": "code",`
`335`		`- "execution_count": null,`
`336`		`- "metadata": {},`
`337`		`- "outputs": [],`
`338`		`- "source": []`
`339`	`332`	`}`
`340`	`333`	`],`
`341`	`334`	`"metadata": {`
Original file line number	Diff line number	Diff line change
`@@ -159,7 +159,7 @@`
`159`	`159`	`" \"month\": \"1,12\",\n",`
`160`	`160`	`" \"day\": \"1,31\"\n",`
`161`	`161`	`" },\n",`
`162`		`- ");"`
	`162`	`+ ")"`
`163`	`163`	`]`
`164`	`164`	`},`
`165`	`165`	`{`
`@@ -334,7 +334,7 @@`
`334`	`334`	`" projection_values={\n",`
`335`	`335`	`" \"city\": \"São Paulo,Tokio,Seattle\"\n",`
`336`	`336`	`" },\n",`
`337`		`- ");"`
	`337`	`+ ")"`
`338`	`338`	`]`
`339`	`339`	`},`
`340`	`340`	`{`
`@@ -511,7 +511,7 @@`
`511`	`511`	`" \"dt\": \"2020-01-01,2020-01-03\",\n",`
`512`	`512`	`" \"ts\": \"2020-01-01 00:00:00,2020-01-01 00:00:02\"\n",`
`513`	`513`	`" },\n",`
`514`		`- ");"`
	`514`	`+ ")"`
`515`	`515`	`]`
`516`	`516`	`},`
`517`	`517`	`{`
`@@ -679,7 +679,7 @@`
`679`	`679`	`" projection_types={\n",`
`680`	`680`	`" \"uuid\": \"injected\",\n",`
`681`	`681`	`" }\n",`
`682`		`- ");"`
	`682`	`+ ")"`
`683`	`683`	`]`
`684`	`684`	`},`
`685`	`685`	`{`
Original file line number	Diff line number	Diff line change
`@@ -16,7 +16,7 @@`
`16`	`16`	`"* [Exploring the public AWS COVID-19 data lake](https://aws.amazon.com/blogs/big-data/exploring-the-public-aws-covid-19-data-lake/)\n",`
`17`	`17`	`"* [CloudFormation template](https://covid19-lake.s3.us-east-2.amazonaws.com/cfn/CovidLakeStack.template.json)\n",`
`18`	`18`	`"\n",`
`19`		`- "Please, install the Cloudformation template above to have access to the public data lake.\n",`
	`19`	`+ "Please, install the CloudFormation template above to have access to the public data lake.\n",`
`20`	`20`	`"\n",`
`21`	`21`	`"P.S. To be able to access the public data lake, you must allow explicitly QuickSight to access the related external bucket."`
`22`	`22`	`]`
Original file line number	Diff line number	Diff line change
`@@ -8,9 +8,9 @@`
`8`	`8`	`"\n",`
`9`	`9`	`"# 20 - Spark Table Interoperability\n",`
`10`	`10`	`"\n",`
`11`		`- "[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficults to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",`
	`11`	`+ "[awswrangler](https://github.com/aws/aws-sdk-pandas) has no difficulty to insert, overwrite or do any other kind of interaction with a Table created by Apache Spark.\n",`
`12`	`12`	`"\n",`
`13`		- "But if you want to do the oposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
	`13`	+ "But if you want to do the opposite (Spark interacting with a table created by awswrangler) you should be aware that awswrangler follows the Hive's format and you must be explicit when using the Spark's `saveAsTable` method:"
`14`	`14`	`]`
`15`	`15`	`},`
`16`	`16`	`{`