Skip to content

Commit f11ff15

Browse files
committed
Add the first 5 tutorials
1 parent e3a522a commit f11ff15

File tree

7 files changed

+1420
-710
lines changed

7 files changed

+1420
-710
lines changed

tutorials/1 - Introduction.ipynb

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"[![AWS Data Wrangler](_static/logo.png \"AWS Data Wrangler\")](https://github.com/awslabs/aws-data-wrangler)\n",
8+
"\n",
9+
"# 1 - Introduction"
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"## What is AWS Data Wrangler?\n",
17+
"\n",
18+
"An [open-source](https://github.com/awslabs/aws-data-wrangler>) Python package that extends the power of [Pandas](https://github.com/pandas-dev/pandas>) library to AWS connecting **DataFrames** and AWS data related services (**Amazon Redshift**, **AWS Glue**, **Amazon Athena**, **Amazon EMR**, etc).\n",
19+
"\n",
20+
"Built on top of other open-source projects like [Pandas](https://github.com/pandas-dev/pandas), [Apache Arrow](https://github.com/apache/arrow), [Boto3](https://github.com/boto/boto3), [s3fs](https://github.com/dask/s3fs), [SQLAlchemy](https://github.com/sqlalchemy/sqlalchemy), [Psycopg2](https://github.com/psycopg/psycopg2) and [PyMySQL](https://github.com/PyMySQL/PyMySQL), it offers abstracted functions to execute usual ETL tasks like load/unload data from **Data Lakes**, **Data Warehouses** and **Databases**.\n",
21+
"\n",
22+
"Check our [list of functionalities](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/api.html)."
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"metadata": {},
28+
"source": [
29+
"## How to install?\n",
30+
"\n",
31+
"The Wrangler runs almost anywhere over Python 3.6, 3.7 and 3.8, so there are several different ways to install it in the desired enviroment.\n",
32+
"\n",
33+
" - [PyPi (pip)](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#pypi-pip)\n",
34+
" - [Conda](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#conda)\n",
35+
" - [AWS Lambda Layer](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#aws-lambda-layer)\n",
36+
" - [AWS Glue Wheel](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#aws-glue-wheel)\n",
37+
" - [Amazon SageMaker Notebook](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#amazon-sagemaker-notebook)\n",
38+
" - [Amazon SageMaker Notebook Lifecycle](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#amazon-sagemaker-notebook-lifecycle)\n",
39+
" - [EMR](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#emr)\n",
40+
" - [From source](https://aws-data-wrangler.readthedocs.io/en/dev-1.0.0/install.html#from-source)\n",
41+
"\n",
42+
"Some good practices for most of the above methods are:\n",
43+
" - Use new and individual Virtual Environments for each project ([venv](https://docs.python.org/3/library/venv.html))\n",
44+
" - On Notebooks, always restart your kernel after installations."
45+
]
46+
},
47+
{
48+
"cell_type": "markdown",
49+
"metadata": {},
50+
"source": [
51+
"## Let's Install it!"
52+
]
53+
},
54+
{
55+
"cell_type": "code",
56+
"execution_count": null,
57+
"metadata": {},
58+
"outputs": [],
59+
"source": [
60+
"!pip install awswrangler"
61+
]
62+
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {},
66+
"source": [
67+
"> Restart your kernel after the installation!"
68+
]
69+
},
70+
{
71+
"cell_type": "code",
72+
"execution_count": 2,
73+
"metadata": {},
74+
"outputs": [
75+
{
76+
"data": {
77+
"text/plain": [
78+
"'1.0.0'"
79+
]
80+
},
81+
"execution_count": 2,
82+
"metadata": {},
83+
"output_type": "execute_result"
84+
}
85+
],
86+
"source": [
87+
"import awswrangler as wr\n",
88+
"\n",
89+
"wr.__version__"
90+
]
91+
}
92+
],
93+
"metadata": {
94+
"kernelspec": {
95+
"display_name": "conda_python3",
96+
"language": "python",
97+
"name": "conda_python3"
98+
},
99+
"language_info": {
100+
"codemirror_mode": {
101+
"name": "ipython",
102+
"version": 3
103+
},
104+
"file_extension": ".py",
105+
"mimetype": "text/x-python",
106+
"name": "python",
107+
"nbconvert_exporter": "python",
108+
"pygments_lexer": "ipython3",
109+
"version": "3.6.5"
110+
}
111+
},
112+
"nbformat": 4,
113+
"nbformat_minor": 4
114+
}

tutorials/2 - Sessions.ipynb

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"[![AWS Data Wrangler](_static/logo.png \"AWS Data Wrangler\")](https://github.com/awslabs/aws-data-wrangler)\n",
8+
"\n",
9+
"# 2 - Sessions"
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"metadata": {},
15+
"source": [
16+
"## How Wrangler handle Sessions and AWS credentials?\n",
17+
"\n",
18+
"After version 1.0.0 Wrangler absolutely relies on [Boto3.Session()](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html) to manage AWS credentials and configurations.\n",
19+
"\n",
20+
"Wrangler will not store any kind of state internally, and users is in charge of all the Sessions management, if necessary.\n",
21+
"\n",
22+
"Most Wrangler functions receive the optional `boto3_session` argument. If None is received, a default boto3 Session will be temporary created to run the function."
23+
]
24+
},
25+
{
26+
"cell_type": "code",
27+
"execution_count": 9,
28+
"metadata": {},
29+
"outputs": [],
30+
"source": [
31+
"import awswrangler as wr\n",
32+
"import boto3"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"## Using the default Sessions"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 7,
45+
"metadata": {},
46+
"outputs": [
47+
{
48+
"data": {
49+
"text/plain": [
50+
"False"
51+
]
52+
},
53+
"execution_count": 7,
54+
"metadata": {},
55+
"output_type": "execute_result"
56+
}
57+
],
58+
"source": [
59+
"wr.s3.does_object_exist(\"s3://noaa-ghcn-pds/fake\")"
60+
]
61+
},
62+
{
63+
"cell_type": "code",
64+
"execution_count": 8,
65+
"metadata": {},
66+
"outputs": [
67+
{
68+
"data": {
69+
"text/plain": [
70+
"False"
71+
]
72+
},
73+
"execution_count": 8,
74+
"metadata": {},
75+
"output_type": "execute_result"
76+
}
77+
],
78+
"source": [
79+
"wr.s3.does_object_exist(\"s3://noaa-ghcn-pds/fake\", boto3_session=None)"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"metadata": {},
85+
"source": [
86+
"## Using custom Sessions"
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": 10,
92+
"metadata": {},
93+
"outputs": [
94+
{
95+
"data": {
96+
"text/plain": [
97+
"False"
98+
]
99+
},
100+
"execution_count": 10,
101+
"metadata": {},
102+
"output_type": "execute_result"
103+
}
104+
],
105+
"source": [
106+
"wr.s3.does_object_exist(\"s3://noaa-ghcn-pds/fake\", boto3_session=boto3.Session())"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": 11,
112+
"metadata": {},
113+
"outputs": [
114+
{
115+
"data": {
116+
"text/plain": [
117+
"False"
118+
]
119+
},
120+
"execution_count": 11,
121+
"metadata": {},
122+
"output_type": "execute_result"
123+
}
124+
],
125+
"source": [
126+
"wr.s3.does_object_exist(\"s3://noaa-ghcn-pds/fake\", boto3_session=boto3.Session(region_name=\"us-east-2\"))"
127+
]
128+
},
129+
{
130+
"cell_type": "code",
131+
"execution_count": null,
132+
"metadata": {},
133+
"outputs": [],
134+
"source": []
135+
}
136+
],
137+
"metadata": {
138+
"kernelspec": {
139+
"display_name": "conda_python3",
140+
"language": "python",
141+
"name": "conda_python3"
142+
},
143+
"language_info": {
144+
"codemirror_mode": {
145+
"name": "ipython",
146+
"version": 3
147+
},
148+
"file_extension": ".py",
149+
"mimetype": "text/x-python",
150+
"name": "python",
151+
"nbconvert_exporter": "python",
152+
"pygments_lexer": "ipython3",
153+
"version": "3.6.5"
154+
}
155+
},
156+
"nbformat": 4,
157+
"nbformat_minor": 4
158+
}

0 commit comments

Comments
 (0)