Skip to content

Commit ea92e5c

Browse files
committed
Add PyArrow 3 caveats on the docs. #546 #547
1 parent 4b9f270 commit ea92e5c

File tree

2 files changed

+27
-7
lines changed

2 files changed

+27
-7
lines changed

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@ Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, Clo
2323
| **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://pepy.tech/badge/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` |
2424
| **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` |
2525

26+
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):** `pip install pyarrow==2 awswrangler`
27+
2628
Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](https://arrow.apache.org/powered_by/)
2729

2830
## Table of contents
@@ -38,6 +40,8 @@ Powered By [<img src="https://arrow.apache.org/img/arrow.png" width="200">](http
3840

3941
Installation command: `pip install awswrangler`
4042

43+
> ⚠️ **For platforms without PyArrow 3 support (e.g. [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):** `pip install pyarrow==2 awswrangler`
44+
4145
```py3
4246
import awswrangler as wr
4347
import pandas as pd

docs/source/install.rst

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Install
22
=======
33

4-
**AWS Data Wrangler** runs with Python ``3.6``, ``3.7`` and ``3.8``
4+
**AWS Data Wrangler** runs with Python ``3.6``, ``3.7``, ``3.8`` and ``3.9``
55
and on several platforms (AWS Lambda, AWS Glue Python Shell, EMR, EC2,
66
on-premises, Amazon SageMaker, local, etc).
77

@@ -57,10 +57,13 @@ AWS Glue PySpark Jobs
5757
Go to your Glue PySpark job and create a new *Job parameters* key/value:
5858

5959
* Key: ``--additional-python-modules``
60-
* Value: ``awswrangler==2.3.0``
60+
* Value: ``pyarrow==2,awswrangler``
6161

62-
P.S. By now AWS Glue PySpark Jobs does not support PyArrow +3.0.0.
63-
Please use awswrangler==2.3.0 that uses PyArrow 2.0.0 to overcome this limitation.
62+
To install a specific version, set the value for above Job parameter as follows:
63+
64+
* Value: ``pyarrow==2,awswrangler==2.4.0``
65+
66+
.. note:: Pyarrow 3 is not currently supported in Glue PySpark Jobs, which is why a previous installation of pyarrow 2 is required.
6467

6568
`Official Glue PySpark Reference <https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-new-features>`_
6669

@@ -115,7 +118,7 @@ AWS Data Wrangler could be a good helper to
115118
complement Big Data pipelines.
116119
117120
- Configure Python 3 as the default interpreter for
118-
PySpark under your cluster configuration
121+
PySpark on your cluster configuration [ONLY REQUIRED FOR EMR < 6]
119122
120123
.. code-block:: json
121124
@@ -135,15 +138,28 @@ complement Big Data pipelines.
135138
136139
- Keep the bootstrap script above on S3 and reference it on your cluster.
137140
141+
- For EMR Release < 6
142+
138143
.. code-block:: sh
139144
140145
#!/usr/bin/env bash
141146
set -ex
142147
143-
sudo pip-3.6 install awswrangler
148+
sudo pip-3.6 install pyarrow==2 awswrangler
149+
150+
- For EMR Release >= 6
151+
152+
.. code-block:: sh
153+
154+
#!/usr/bin/env bash
155+
set -ex
156+
157+
sudo pip install pyarrow==2 awswrangler
144158
145159
.. note:: Make sure to freeze the Wrangler version in the bootstrap for productive
146-
environments (e.g. awswrangler==1.8.1)
160+
environments (e.g. awswrangler==2.4.0)
161+
162+
.. note:: Pyarrow 3 is not currently supported in the default EMR image, which is why a previous installation of pyarrow 2 is required.
147163
148164
From Source
149165
-----------

0 commit comments

Comments
 (0)