Skip to content

Commit 79e33c8

Browse files
authored
(docs): Update the install section (#1242)
* (docs): Update the install section * PR feedback
1 parent 304e734 commit 79e33c8

File tree

4 files changed

+125
-87
lines changed

4 files changed

+125
-87
lines changed

docs/source/install.rst

Lines changed: 35 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
Install
22
=======
33

4-
**AWS Data Wrangler** runs with Python ``3.7``, ``3.8``, ``3.9`` and ``3.10``.
4+
**AWS Data Wrangler** runs on Python ``3.7``, ``3.8``, ``3.9`` and ``3.10``,
55
and on several platforms (AWS Lambda, AWS Glue Python Shell, EMR, EC2,
66
on-premises, Amazon SageMaker, local, etc).
77

8-
Some good practices for most of the methods below are:
8+
Some good practices to follow for options below are:
99

10-
- Use new and individual Virtual Environments for each project (`venv <https://docs.python.org/3/library/venv.html>`_).
11-
- On Notebooks, always restart your kernel after installations.
10+
- Use new and isolated Virtual Environments for each project (`venv <https://docs.python.org/3/library/venv.html>`_).
11+
- On Notebooks, always restart your kernel after installations.
1212

13-
.. note:: If you want to use ``awswrangler`` for connecting to Microsoft SQL Server, some additional configuration is needed. Please have a look at the corresponding section below.
13+
.. note:: If you want to use ``awswrangler`` to connect to Microsoft SQL Server, some additional configuration is needed. Please have a look at the corresponding section below.
1414

1515
PyPI (pip)
1616
----------
@@ -28,60 +28,45 @@ AWS Lambda Layer
2828
Managed Layer
2929
^^^^^^^^^^^^^^
3030

31-
AWS Data Wrangler is available as an AWS Lambda Managed layer in the following regions:
31+
.. note:: There is a one week minimum delay between version release and layers being available in the AWS Lambda console.
3232

33-
- ap-northeast-1
34-
- ap-southeast-2
35-
- eu-central-1
36-
- eu-west-1
37-
- us-east-1
38-
- us-east-2
39-
- us-west-2
33+
AWS Data Wrangler is available as an AWS Lambda Managed layer in all AWS commercial regions.
4034

4135
It can be accessed in the AWS Lambda console directly:
4236

4337
.. image:: _static/aws_lambda_managed_layer.png
4438
:width: 400
4539
:alt: AWS Managed Lambda Layer
4640

47-
Or via its ARN:
48-
49-
============================= ================ =======================================================================
50-
AWS Data Wrangler Version Python Version Layer ARN
51-
============================= ================ =======================================================================
52-
2.12.0 3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:1
53-
2.12.0 3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:1
54-
2.13.0 3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:2
55-
2.13.0 3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:2
56-
2.13.0 3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39:1
57-
2.14.0 3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:3
58-
2.14.0 3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:3
59-
2.14.0 3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39:2
60-
============================= ================ =======================================================================
41+
Or via its ARN: ``arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python<python-version>:<layer-version>``.
42+
43+
For example: ``arn:aws:lambda:us-east-1:336392948345:layer:AWSDataWrangler-Python37:1``.
44+
45+
The full list of ARNs is available `here <layers.rst>`__.
6146

6247
Custom Layer
6348
^^^^^^^^^^^^^^
6449

65-
For AWS regions not in the above list, you can create your own Lambda layer following these instructions:
50+
You can also create your own Lambda layer with these instructions:
6651

6752
1 - Go to `GitHub's release section <https://github.com/awslabs/aws-data-wrangler/releases>`_
68-
and download the layer zip related to the desired version. Alternatively, you can download the zip from the `public artifacts bucket <https://aws-data-wrangler.readthedocs.io/en/latest/install.html#public-artifacts>`_.
53+
and download the zipped layer for to the desired version. Alternatively, you can download the zip from the `public artifacts bucket <https://aws-data-wrangler.readthedocs.io/en/latest/install.html#public-artifacts>`_.
6954

70-
2 - Go to the AWS Lambda Panel, open the layer section (left side)
55+
2 - Go to the AWS Lambda console, open the layer section (left side)
7156
and click **create layer**.
7257

73-
3 - Set name and python version, upload your fresh downloaded zip file
74-
and press **create** to create the layer.
58+
3 - Set name and python version, upload your downloaded zip file
59+
and press **create**.
7560

76-
4 - Go to your Lambda and select your new layer!
61+
4 - Go to your Lambda function and select your new layer!
7762

7863
Serverless Application Repository (SAR)
7964
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8065

8166
Starting version `2.12.0`, AWS Data Wrangler layers are also available in the `AWS Serverless Application Repository <https://serverlessrepo.aws.amazon.com/applications>`_ (SAR).
8267

8368
The app deploys the Lambda layer version in your own AWS account and region via a CloudFormation stack.
84-
This option provides the ability to use semantic versions (i.e. library version) instead of Lambda layer versions.
69+
This option provides the ability to use semantic versions (i.e. library version) instead of Lambda layer versions.
8570

8671
.. list-table:: AWS Data Wrangler Layer Apps
8772
:widths: 25 25 50
@@ -135,34 +120,33 @@ Here is an example of how to create and use the AWS Data Wrangler Lambda layer i
135120
AWS Glue Python Shell Jobs
136121
--------------------------
137122

138-
.. note:: Glue Python Shell only supports Python3.6, for which support was dropped in version 2.15.0 of Wrangler. Please use version 2.14.0 or below.
123+
.. note:: Glue Python Shell runs on Python3.6, for which support was dropped in version 2.15.0 of Wrangler. Please use version 2.14.0 of the library or below.
139124

140125
1 - Go to `GitHub's release page <https://github.com/awslabs/aws-data-wrangler/releases>`_ and download the wheel file
141126
(.whl) related to the desired version. Alternatively, you can download the wheel from the `public artifacts bucket <https://aws-data-wrangler.readthedocs.io/en/latest/install.html#public-artifacts>`_.
142127

143-
2 - Upload the wheel file to any Amazon S3 location.
128+
2 - Upload the wheel file to the Amazon S3 location of your choice.
144129

145-
3 - Go to your Glue Python Shell job and point to the wheel file on S3 in
130+
3 - Go to your Glue Python Shell job and point to the S3 wheel file in
146131
the *Python library path* field.
147132

148-
149133
`Official Glue Python Shell Reference <https://docs.aws.amazon.com/glue/latest/dg/add-job-python.html#create-python-extra-library>`_
150134

151135
AWS Glue PySpark Jobs
152136
---------------------
153137

154-
.. note:: AWS Data Wrangler has compiled dependencies (C/C++) so there is only support for ``Glue PySpark Jobs >= 2.0``.
138+
.. note:: AWS Data Wrangler has compiled dependencies (C/C++) so support is only available for ``Glue PySpark Jobs >= 2.0``.
155139

156140
Go to your Glue PySpark job and create a new *Job parameters* key/value:
157141

158142
* Key: ``--additional-python-modules``
159143
* Value: ``pyarrow==2,awswrangler``
160144

161-
To install a specific version, set the value for above Job parameter as follows:
145+
To install a specific version, set the value for the above Job parameter as follows:
162146

163147
* Value: ``cython==0.29.21,pg8000==1.21.0,pyarrow==2,pandas==1.3.0,awswrangler==2.15.0``
164148

165-
.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required.
149+
.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why an installation of pyarrow 2 is required.
166150

167151
`Official Glue PySpark Reference <https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-new-features>`_
168152

@@ -184,16 +168,16 @@ For example: ``s3://aws-data-wrangler-public-artifacts/releases/2.15.0/awswrangl
184168
Amazon SageMaker Notebook
185169
-------------------------
186170

187-
Run this command in any Python 3 notebook paragraph and then make sure to
188-
**restart the kernel** before import the **awswrangler** package.
171+
Run this command in any Python 3 notebook cell and then make sure to
172+
**restart the kernel** before importing the **awswrangler** package.
189173

190174
>>> !pip install awswrangler
191175

192176
Amazon SageMaker Notebook Lifecycle
193177
-----------------------------------
194178

195-
Open SageMaker console, go to the lifecycle section and
196-
use the follow snippet to configure AWS Data Wrangler for all compatible
179+
Open the AWS SageMaker console, go to the lifecycle section and
180+
use the below snippet to configure AWS Data Wrangler for all compatible
197181
SageMaker kernels (`Reference <https://github.com/aws-samples/amazon-sagemaker-notebook-instance-lifecycle-config-samples/blob/master/scripts/install-pip-package-all-environments/on-start.sh>`_).
198182

199183
.. code-block:: sh
@@ -227,9 +211,7 @@ SageMaker kernels (`Reference <https://github.com/aws-samples/amazon-sagemaker-n
227211
EMR Cluster
228212
-----------
229213
230-
Even not being a distributed library,
231-
AWS Data Wrangler could be a good helper to
232-
complement Big Data pipelines.
214+
Despite not being a distributed library, AWS Data Wrangler could be used to complement Big Data pipelines.
233215
234216
- Configure Python 3 as the default interpreter for
235217
PySpark on your cluster configuration [ONLY REQUIRED FOR EMR < 6]
@@ -270,10 +252,10 @@ complement Big Data pipelines.
270252
271253
sudo pip install pyarrow==2 awswrangler
272254
273-
.. note:: Make sure to freeze the Wrangler version in the bootstrap for productive
255+
.. note:: Make sure to freeze the library version in the bootstrap for production
274256
environments (e.g. awswrangler==2.15.0)
275257
276-
.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required.
258+
.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why an installation of pyarrow 2 is required.
277259
278260
From Source
279261
-----------
@@ -286,9 +268,9 @@ From Source
286268
Notes for Microsoft SQL Server
287269
------------------------------
288270
289-
``awswrangler`` is using the `pyodbc <https://github.com/mkleehammer/pyodbc>`_
290-
for interacting with Microsoft SQL Server. For installing this package you need the ODBC header files,
291-
which can be installed, for example, with the following commands:
271+
``awswrangler`` uses `pyodbc <https://github.com/mkleehammer/pyodbc>`_
272+
for interacting with Microsoft SQL Server. To install this package you need the ODBC header files,
273+
which can be installed, with the following commands:
292274
293275
>>> sudo apt install unixodbc-dev
294276
>>> yum install unixODBC-devel

docs/source/layers.rst

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
AWS Lambda Managed Layers
2+
==========================
3+
4+
2.15.0
5+
^^^^^^^
6+
7+
All AWS commercial regions. Arm64 support is introduced for this version.
8+
9+
================ =============================================================================
10+
Python Version Layer ARN
11+
================ =============================================================================
12+
3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:4
13+
3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:4
14+
3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39:3
15+
3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38-Arm64:1
16+
3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39-Arm64:1
17+
================ =============================================================================
18+
19+
2.14.0
20+
^^^^^^^
21+
22+
AWS regions: ap-northeast-1, ap-southeast-2, eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2
23+
24+
================ =======================================================================
25+
Python Version Layer ARN
26+
================ =======================================================================
27+
3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:3
28+
3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:3
29+
3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39:2
30+
================ =======================================================================
31+
32+
2.13.0
33+
^^^^^^^
34+
35+
AWS regions: ap-northeast-1, ap-southeast-2, eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2
36+
37+
================ =======================================================================
38+
Python Version Layer ARN
39+
================ =======================================================================
40+
3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:2
41+
3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:2
42+
3.9 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python39:1
43+
================ =======================================================================
44+
45+
2.12.0
46+
^^^^^^^
47+
48+
AWS regions: us-east-1
49+
50+
================ =======================================================================
51+
Python Version Layer ARN
52+
================ =======================================================================
53+
3.7 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python37:1
54+
3.8 arn:aws:lambda:<region>:336392948345:layer:AWSDataWrangler-Python38:1
55+
================ =======================================================================

0 commit comments

Comments
 (0)