aws
diff --git a/‎tutorials/001 - Introduction.ipynb‎
Lines changed: 6 additions & 5 deletions b/‎tutorials/001 - Introduction.ipynb‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎tutorials/002 - Sessions.ipynb‎
Lines changed: 3 additions & 3 deletions b/‎tutorials/002 - Sessions.ipynb‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎tutorials/003 - Amazon S3.ipynb‎
Lines changed: 12 additions & 178 deletions b/‎tutorials/003 - Amazon S3.ipynb‎
Lines changed: 12 additions & 178 deletions
diff --git a/‎tutorials/004 - Parquet Datasets.ipynb‎
Lines changed: 11 additions & 11 deletions b/‎tutorials/004 - Parquet Datasets.ipynb‎
Lines changed: 11 additions & 11 deletions
diff --git a/‎tutorials/005 - Glue Catalog.ipynb‎
Lines changed: 4 additions & 8 deletions b/‎tutorials/005 - Glue Catalog.ipynb‎
Lines changed: 4 additions & 8 deletions
diff --git a/‎tutorials/007 - Redshift, MySQL, PostgreSQL.ipynb‎
Lines changed: 0 additions & 7 deletions b/‎tutorials/007 - Redshift, MySQL, PostgreSQL.ipynb‎
Lines changed: 0 additions & 7 deletions
@@ -17,7 +17,7 @@
     "\n",
     "An [open-source](https://github.com/awslabs/aws-data-wrangler>) Python package that extends the power of [Pandas](https://github.com/pandas-dev/pandas>) library to AWS connecting **DataFrames** and AWS data related services (**Amazon Redshift**, **AWS Glue**, **Amazon Athena**, **Amazon EMR**, etc).\n",
     "\n",
-    "Built on top of other open-source projects like [Pandas](https://github.com/pandas-dev/pandas), [Apache Arrow](https://github.com/apache/arrow), [Boto3](https://github.com/boto/boto3), [s3fs](https://github.com/dask/s3fs), [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy), [Psycopg2](https://github.com/psycopg/psycopg2) and [PyMySQL](https://github.com/PyMySQL/PyMySQL), it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.\n",
+    "Built on top of other open-source projects like [Pandas](https://github.com/pandas-dev/pandas), [Apache Arrow](https://github.com/apache/arrow), [Boto3](https://github.com/boto/boto3), [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy), [Psycopg2](https://github.com/psycopg/psycopg2) and [PyMySQL](https://github.com/PyMySQL/PyMySQL), it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.\n",
     "\n",
     "Check our [list of functionalities](https://aws-data-wrangler.readthedocs.io/en/latest/api.html)."
    ]
@@ -33,7 +33,8 @@
     "  - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#pypi-pip)\n",
     "  - [Conda](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#conda)\n",
     "  - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#aws-lambda-layer)\n",
-    "  - [AWS Glue Wheel](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#aws-glue-wheel)\n",
+    "  - [AWS Glue Python Shell Jobs](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#aws-glue-python-shell-jobs)\n",
+    "  - [AWS Glue PySpark Jobs](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#aws-glue-pyspark-jobs)\n",
     "  - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#amazon-sagemaker-notebook)\n",
     "  - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#amazon-sagemaker-notebook-lifecycle)\n",
     "  - [EMR Cluster](https://aws-data-wrangler.readthedocs.io/en/latest/install.html#emr-cluster)\n",
@@ -69,16 +70,16 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [
     {
      "data": {
       "text/plain": [
-       "'1.7.0'"
+       "'1.9.0'"
       ]
      },
-     "execution_count": 1,
+     "execution_count": 2,
      "metadata": {},
      "output_type": "execute_result"
     }
 
@@ -36,7 +36,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using the default Session"
+    "## Using the default Boto3 Session"
    ]
   },
   {
@@ -63,7 +63,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Customizing and using the default Session"
+    "## Customizing and using the default Boto3 Session"
    ]
   },
   {
@@ -92,7 +92,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Using a new custom Session"
+    "## Using a new custom Boto3 Session"
    ]
   },
   {
 
@@ -1180,8 +1180,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "begin = datetime.strptime(\"05/06/20 16:30\", \"%d/%m/%y %H:%M\")\n",
-    "end = datetime.strptime(\"15/06/21 16:30\", \"%d/%m/%y %H:%M\")\n",
+    "begin = datetime.strptime(\"20-07-31 20:30\", \"%y-%m-%d %H:%M\")\n",
+    "end = datetime.strptime(\"21-07-31 20:30\", \"%y-%m-%d %H:%M\")\n",
     "\n",
     "begin_utc = pytz.utc.localize(begin)\n",
     "end_utc = pytz.utc.localize(end)"
@@ -1200,198 +1200,32 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "begin = datetime.strptime(\"05/06/20 16:30\", \"%d/%m/%y %H:%M\")\n",
-    "end = datetime.strptime(\"10/06/21 16:30\", \"%d/%m/%y %H:%M\")\n",
+    "begin = datetime.strptime(\"20-07-31 20:30\", \"%y-%m-%d %H:%M\")\n",
+    "end = datetime.strptime(\"21-07-31 20:30\", \"%y-%m-%d %H:%M\")\n",
     "\n",
     "timezone = pytz.timezone(\"America/Los_Angeles\")\n",
     "\n",
     "begin_Los_Angeles = timezone.localize(begin)\n",
     "end_Los_Angeles = timezone.localize(end)"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2020-06-05 16:30:00+00:00\n",
-      "2021-06-15 16:30:00+00:00\n",
-      "2020-06-05 16:30:00-07:00\n",
-      "2021-06-10 16:30:00-07:00\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(begin_utc)\n",
-    "print(end_utc)\n",
-    "print(begin_Los_Angeles)\n",
-    "print(end_Los_Angeles)"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5.3 Read json with no LastModified filter "
+    "### 5.3 Read json using the LastModified filters "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "# read_fwf\n",
-      "   id          name      date\n",
-      "0   1   Herfelingen  27-12-18\n",
-      "1   2     Lambusart  14-06-18\n",
-      "2   3  Spormaggiore  15-04-18\n",
-      "3   4     Buizingen  05-09-19\n",
-      "4   5    San Rafael  04-09-19\n",
-      "\n",
-      " read_json\n",
-      "   id name\n",
-      "0   1  foo\n",
-      "1   2  boo\n",
-      "0   3  bar\n",
-      "\n",
-      " read_csv\n",
-      "   id name\n",
-      "0   1  foo\n",
-      "1   2  boo\n",
-      "2   3  bar\n",
-      "\n",
-      " read_parquet\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>id</th>\n",
-       "      <th>name</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1</td>\n",
-       "      <td>foo</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>2</td>\n",
-       "      <td>boo</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>3</td>\n",
-       "      <td>bar</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   id name\n",
-       "0   1  foo\n",
-       "1   2  boo\n",
-       "2   3  bar"
-      ]
-     },
-     "execution_count": 22,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print('# read_fwf')\n",
-    "print(wr.s3.read_fwf(f\"s3://{bucket}/fwf/\", names=[\"id\", \"name\", \"date\"]))\n",
-    "print('\\n read_json')\n",
-    "print(wr.s3.read_json(f\"s3://{bucket}/json/\"))\n",
-    "print('\\n read_csv')\n",
-    "print(wr.s3.read_csv(f\"s3://{bucket}/csv/\"))\n",
-    "print('\\n read_parquet')\n",
-    "wr.s3.read_parquet(f\"s3://{bucket}/parquet/\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### 5.4 Read json using the LastModified filter  "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 21,
    "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "# read_fwf\n",
-      "   id          name      date\n",
-      "0   1   Herfelingen  27-12-18\n",
-      "1   2     Lambusart  14-06-18\n",
-      "2   3  Spormaggiore  15-04-18\n",
-      "3   4     Buizingen  05-09-19\n",
-      "4   5    San Rafael  04-09-19\n",
-      "\n",
-      " read_json\n",
-      "   id name\n",
-      "0   1  foo\n",
-      "1   2  boo\n",
-      "0   3  bar\n",
-      "\n",
-      " read_csv\n",
-      "   id name\n",
-      "0   1  foo\n",
-      "1   2  boo\n",
-      "2   3  bar\n",
-      "\n",
-      " read_parquet\n",
-      "   id name\n",
-      "0   1  foo\n",
-      "1   2  boo\n",
-      "2   3  bar\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
-    "print('# read_fwf')\n",
-    "print(wr.s3.read_fwf(f\"s3://{bucket}/fwf/\", names=[\"id\", \"name\", \"date\"], last_modified_begin=begin_utc, last_modified_end=end_utc))\n",
-    "print('\\n read_json')\n",
-    "print(wr.s3.read_json(f\"s3://{bucket}/json/\", last_modified_begin=begin_utc, last_modified_end=end_utc))\n",
-    "print('\\n read_csv')\n",
-    "print(wr.s3.read_csv(f\"s3://{bucket}/csv/\", last_modified_begin=begin_utc, last_modified_end=end_utc))\n",
-    "print('\\n read_parquet')\n",
-    "print(wr.s3.read_parquet(f\"s3://{bucket}/parquet/\", last_modified_begin=begin_utc, last_modified_end=end_utc))"
+    "wr.s3.read_fwf(f\"s3://{bucket}/fwf/\", names=[\"id\", \"name\", \"date\"], last_modified_begin=begin_utc, last_modified_end=end_utc)\n",
+    "wr.s3.read_json(f\"s3://{bucket}/json/\", last_modified_begin=begin_utc, last_modified_end=end_utc)\n",
+    "wr.s3.read_csv(f\"s3://{bucket}/csv/\", last_modified_begin=begin_utc, last_modified_end=end_utc)\n",
+    "wr.s3.read_parquet(f\"s3://{bucket}/parquet/\", last_modified_begin=begin_utc, last_modified_end=end_utc);"
    ]
   },
   {
@@ -1403,7 +1237,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 22,
    "metadata": {},
    "outputs": [],
    "source": [
 
@@ -50,7 +50,7 @@
      "name": "stdin",
      "output_type": "stream",
      "text": [
-      " ···········································\n"
+      " ············\n"
      ]
     }
    ],
@@ -184,31 +184,31 @@
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
-       "      <td>3</td>\n",
-       "      <td>bar</td>\n",
-       "      <td>2020-01-03</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
        "      <td>1</td>\n",
        "      <td>foo</td>\n",
        "      <td>2020-01-01</td>\n",
        "    </tr>\n",
        "    <tr>\n",
-       "      <th>2</th>\n",
+       "      <th>1</th>\n",
        "      <td>2</td>\n",
        "      <td>boo</td>\n",
        "      <td>2020-01-02</td>\n",
        "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>3</td>\n",
+       "      <td>bar</td>\n",
+       "      <td>2020-01-03</td>\n",
+       "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "text/plain": [
        "   id value        date\n",
-       "0   3   bar  2020-01-03\n",
-       "1   1   foo  2020-01-01\n",
-       "2   2   boo  2020-01-02"
+       "0   1   foo  2020-01-01\n",
+       "1   2   boo  2020-01-02\n",
+       "2   3   bar  2020-01-03"
       ]
      },
      "execution_count": 4,
 
@@ -39,7 +39,7 @@
      "name": "stdin",
      "output_type": "stream",
      "text": [
-      " ···········································\n"
+      " ············\n"
      ]
     }
    ],
@@ -197,9 +197,7 @@
      "text": [
       "            Database                                   Description\n",
       "0  aws_data_wrangler  AWS Data Wrangler Test Arena - Glue Database\n",
-      "1     aws_dataframes     AWS DataFrames Test Arena - Glue Database\n",
-      "2           covid-19                                              \n",
-      "3            default                         Default Hive database\n"
+      "1            default                         Default Hive database\n"
      ]
     }
    ],
@@ -226,10 +224,8 @@
      "text": [
       "            Database                                   Description\n",
       "0  aws_data_wrangler  AWS Data Wrangler Test Arena - Glue Database\n",
-      "1     aws_dataframes     AWS DataFrames Test Arena - Glue Database\n",
-      "2   awswrangler_test                                              \n",
-      "3           covid-19                                              \n",
-      "4            default                         Default Hive database\n"
+      "1   awswrangler_test                                              \n",
+      "2            default                         Default Hive database\n"
      ]
     }
    ],
 
@@ -168,13 +168,6 @@
     "wr.db.read_sql_query(\"SELECT * FROM test.tutorial\", con=eng_mysql)  # MySQL\n",
     "wr.db.read_sql_query(\"SELECT * FROM public.tutorial\", con=eng_redshift)  # Redshift"
    ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {
Original file line number	Diff line number	Diff line change
`@@ -36,7 +36,7 @@`
`36`	`36`	`"cell_type": "markdown",`
`37`	`37`	`"metadata": {},`
`38`	`38`	`"source": [`
`39`		`- "## Using the default Session"`
	`39`	`+ "## Using the default Boto3 Session"`
`40`	`40`	`]`
`41`	`41`	`},`
`42`	`42`	`{`
`@@ -63,7 +63,7 @@`
`63`	`63`	`"cell_type": "markdown",`
`64`	`64`	`"metadata": {},`
`65`	`65`	`"source": [`
`66`		`- "## Customizing and using the default Session"`
	`66`	`+ "## Customizing and using the default Boto3 Session"`
`67`	`67`	`]`
`68`	`68`	`},`
`69`	`69`	`{`
`@@ -92,7 +92,7 @@`
`92`	`92`	`"cell_type": "markdown",`
`93`	`93`	`"metadata": {},`
`94`	`94`	`"source": [`
`95`		`- "## Using a new custom Session"`
	`95`	`+ "## Using a new custom Boto3 Session"`
`96`	`96`	`]`
`97`	`97`	`},`
`98`	`98`	`{`
Original file line number	Diff line number	Diff line change
`@@ -39,7 +39,7 @@`
`39`	`39`	`"name": "stdin",`
`40`	`40`	`"output_type": "stream",`
`41`	`41`	`"text": [`
`42`		`- " ···········································\n"`
	`42`	`+ " ············\n"`
`43`	`43`	`]`
`44`	`44`	`}`
`45`	`45`	`],`
`@@ -197,9 +197,7 @@`
`197`	`197`	`"text": [`
`198`	`198`	`" Database Description\n",`
`199`	`199`	`"0 aws_data_wrangler AWS Data Wrangler Test Arena - Glue Database\n",`
`200`		`- "1 aws_dataframes AWS DataFrames Test Arena - Glue Database\n",`
`201`		`- "2 covid-19 \n",`
`202`		`- "3 default Default Hive database\n"`
	`200`	`+ "1 default Default Hive database\n"`
`203`	`201`	`]`
`204`	`202`	`}`
`205`	`203`	`],`
`@@ -226,10 +224,8 @@`
`226`	`224`	`"text": [`
`227`	`225`	`" Database Description\n",`
`228`	`226`	`"0 aws_data_wrangler AWS Data Wrangler Test Arena - Glue Database\n",`
`229`		`- "1 aws_dataframes AWS DataFrames Test Arena - Glue Database\n",`
`230`		`- "2 awswrangler_test \n",`
`231`		`- "3 covid-19 \n",`
`232`		`- "4 default Default Hive database\n"`
	`227`	`+ "1 awswrangler_test \n",`
	`228`	`+ "2 default Default Hive database\n"`
`233`	`229`	`]`
`234`	`230`	`}`
`235`	`231`	`],`
Original file line number	Diff line number	Diff line change
`@@ -168,13 +168,6 @@`
`168`	`168`	`"wr.db.read_sql_query(\"SELECT * FROM test.tutorial\", con=eng_mysql) # MySQL\n",`
`169`	`169`	`"wr.db.read_sql_query(\"SELECT * FROM public.tutorial\", con=eng_redshift) # Redshift"`
`170`	`170`	`]`
`171`		`- },`
`172`		`- {`
`173`		`- "cell_type": "code",`
`174`		`- "execution_count": null,`
`175`		`- "metadata": {},`
`176`		`- "outputs": [],`
`177`		`- "source": []`
`178`	`171`	`}`
`179`	`172`	`],`
`180`	`173`	`"metadata": {`