Skip to content

Commit fbf7608

Browse files
committed
Address PR comments; more details on pyspark shell v submitting apps
Signed-off-by: Jason T. Brown <[email protected]>
1 parent 3a51158 commit fbf7608

File tree

1 file changed

+50
-10
lines changed

1 file changed

+50
-10
lines changed

pyrasterframes/src/main/python/docs/getting-started.pymd

Lines changed: 50 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
There are @ref:[several ways](getting-started.md#other-options) to use RasterFrames, and @ref:[several languages](languages.md) with which you can use it. Let's start with the simplest: the Python shell. To get started you will need:
44

55
1. [Python](https://www.python.org/) installed. Version 3.6 or greater is recommended.
6-
1. `pip` or `pip3` (recommended) installed. If you are using Python 3, `pip3` may already be installed.
6+
1. [`pip`](https://pip.pypa.io/en/stable/installing/) installed. If you are using Python 3, `pip` may already be installed.
77
1. Java [JDK 8](https://openjdk.java.net/install/index.html) installed on your system and `java` on your system `PATH` or `JAVA_HOME` pointing to a Java installation.
88

99
## pip install pyrasterframes
@@ -30,11 +30,12 @@ df = spark.read.raster('https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/201
3030

3131
# Add 3 element-wise, show some rows of the dataframe
3232
df.select(rf_local_add(df.proj_raster, lit(3))).show(5, False)
33-
3433
```
3534

3635
This example is extended in the [getting started Jupyter notebook](https://nbviewer.jupyter.org/github/locationtech/rasterframes/blob/develop/rf-notebook/src/main/notebooks/Getting%20Started.ipynb).
3736

37+
## Next Steps
38+
3839
To understand more about how and why RasterFrames represents Earth observation in DataFrames, read about the @ref:[core concepts](concepts.md) and the project @ref:[description](description.md). For more hands-on examples, see the chapters about @ref:[reading](raster-io.md) and @ref:[processing](raster-processing.md) with RasterFrames.
3940

4041
## Other Options
@@ -60,35 +61,74 @@ See [RasterFrames Notebook README](https://github.com/locationtech/rasterframes/
6061

6162
### `pyspark` shell or app
6263

63-
To initialize RasterFrames in a `pyspark` shell, prepare to call pyspark with the appropriate `--master` and other `--conf` arguments for your cluster manager and environment. To these you will add the PyRasterFrames assembly JAR and the python source zip.
64+
You can use RasterFrames in a `pyspark` shell or when submitting a `pyspark` app via a Python script. To set up the `pyspark` environment, prepare your call with the appropriate `--master` and other `--conf` arguments for your cluster manager and environment. To these you will add the PyRasterFrames assembly JAR and the python source zip.
6465

6566
You can either [build](https://github.com/locationtech/rasterframes/blob/develop/README.md) the artifacts or download them:
6667

67-
* Assembly JAR: https://repo1.maven.org/maven2/org/locationtech/rasterframes/pyrasterframes_2.11/${VERSION}/pyrasterframes-assembly-${VERSION}.jar
6868
* Python zip: https://repo1.maven.org/maven2/org/locationtech/rasterframes/pyrasterframes_2.11/${VERSION}/pyrasterframes_2.11-${VERSION}-python.zip
69+
* Assembly JAR:
70+
* The assembly JAR is embedded in the wheel file publised on pypi. Download the wheel from https://pypi.org/project/pyrasterframes/#files
71+
* The wheel file is just a [zip file with .whl extension](https://www.python.org/dev/peps/pep-0427/); you can extract the assembly JAR with a command like this: `unzip -j $PYRF_WHEEL $(zipinfo -1 $PYRF_WHEEL | grep jar)`
72+
73+
74+
#### Shell
6975

76+
The `pyspark` shell command will look something like this, replacing the `--jars` argument with the assembly jar and the `--py-files` with the source zip (not the wheel). To submit a script, add a .py file as the final argument
7077

7178
```bash
7279
pyspark \
80+
--master local[*] \
7381
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
7482
--conf spark.kryo.registrator=org.locationtech.rasterframes.util.RFKryoRegistrator \
7583
--conf spark.kryoserializer.buffer.max=500m \
7684
--jars pyrasterframes/target/scala-2.11/pyrasterframes-assembly-${VERSION}.jar \
7785
--py-files pyrasterframes/target/scala-2.11/pyrasterframes-python-${VERSION}.zip
7886
```
7987

80-
Then in the pyspark shell, import the module and call `withRasterFrames` on the SparkSession.
88+
Then in the `pyspark` shell, import the module and call `withRasterFrames` on the SparkSession.
8189

8290
```python, evaluate=False
83-
import pyrasterframes
84-
spark = spark.withRasterFrames()
85-
df = spark.read.raster('https://landsat-pds.s3.amazonaws.com/c1/L8/158/072/LC08_L1TP_158072_20180515_20180604_01_T1/LC08_L1TP_158072_20180515_20180604_01_T1_B5.TIF')
91+
Welcome to
92+
____ __
93+
/ __/__ ___ _____/ /__
94+
_\ \/ _ \/ _ `/ __/ '_/
95+
/__ / .__/\_,_/_/ /_/\_\ version 2.3.2
96+
/_/
97+
98+
Using Python version 3.7.3 (default, Mar 27 2019 15:43:19)
99+
SparkSession available as 'spark'.
100+
>>> import pyrasterframes
101+
>>> spark = spark.withRasterFrames()
102+
>>> df = spark.read.raster('https://landsat-pds.s3.amazonaws.com/c1/L8/158/072/LC08_L1TP_158072_20180515_20180604_01_T1/LC08_L1TP_158072_20180515_20180604_01_T1_B5.TIF')
86103
```
87104

88105
Now you have the configured SparkSession with RasterFrames enabled.
89106

90-
```python, echo=False
91-
spark.stop()
107+
#### Submitting Apps
108+
109+
Prepare the call to `spark-submit` in much the same way as using the `pyspark` shell. In the python script you submit, you will use the SparkSession builder pattern and add some RasterFrames extras to it. You have more flexibility in setting up configurations in either your script or in the `spark-submit` call.
110+
111+
```python, evaluate=False
112+
# contents of app.py
113+
114+
from pyspark.sql import SparkSession
115+
import pyrasterframes
116+
spark = (SparkSession.builder
117+
.appName("My RasterFrames app")
118+
.config('spark.some_config', some_val) # app configurations
119+
.withKryoSerialization() # sets spark.serializer and spark.kryo configs
120+
.getOrCreate()).withRasterFrames()
121+
df = spark.read.raster('...')
122+
```
123+
124+
To submit, use a call like:
125+
126+
```bash
127+
$ spark-submit \
128+
--master spark://sparkmaster:7077 \
129+
--jars pyrasterframes/target/scala-2.11/pyrasterframes-assembly-${VERSION}.jar \
130+
--py-files pyrasterframes/target/scala-2.11/pyrasterframes-python-${VERSION}.zip \
131+
app.py
92132
```
93133

94134
## Installing GDAL

0 commit comments

Comments
 (0)