Skip to content

Commit b975828

Browse files
committed
Update getting started page, simplify pyspark shell; update notebooks
Signed-off-by: Jason T. Brown <[email protected]>
1 parent fbf7608 commit b975828

File tree

4 files changed

+181
-903
lines changed

4 files changed

+181
-903
lines changed

pyrasterframes/README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -153,7 +153,13 @@ sbt 'pySetup test --addopts "-k test_tile_creation"'
153153
Or to build a specific document:
154154

155155
```bash
156-
sbt 'pySetup pweave -f docs/raster-io.pymd'
156+
sbt 'pySetup pweave -s docs/raster-io.pymd'
157+
```
158+
159+
Or to build a specific document with desired output format:
160+
161+
```bash
162+
sbt 'pySetup pweave -f notebook -s docs/numpy-pandas.pymd'
157163
```
158164

159165
*Note: You may need to run `sbt pyrasterframes/package` at least once for certain `pySetup` commands to work.*

pyrasterframes/src/main/python/docs/getting-started.pymd

Lines changed: 7 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -59,30 +59,20 @@ You can also use RasterFrames in the following environments:
5959

6060
See [RasterFrames Notebook README](https://github.com/locationtech/rasterframes/blob/develop/rf-notebook/README.md) for instructions on building the Docker image for this Jupyter notebook server.
6161

62-
### `pyspark` shell or app
62+
### `pyspark` shell
6363

64-
You can use RasterFrames in a `pyspark` shell or when submitting a `pyspark` app via a Python script. To set up the `pyspark` environment, prepare your call with the appropriate `--master` and other `--conf` arguments for your cluster manager and environment. To these you will add the PyRasterFrames assembly JAR and the python source zip.
64+
You can use RasterFrames in a `pyspark` shell. To set up the `pyspark` environment, prepare your call with the appropriate `--master` and other `--conf` arguments for your cluster manager and environment. For RasterFrames support you need to pass arguments pointing to the various Java dependencies. You will also need the Python source zip, even if you have pip installed the package. You can download the source zip here: https://repo1.maven.org/maven2/org/locationtech/rasterframes/pyrasterframes_2.11/${VERSION}/pyrasterframes_2.11-${VERSION}-python.zip.
6565

66-
You can either [build](https://github.com/locationtech/rasterframes/blob/develop/README.md) the artifacts or download them:
67-
68-
* Python zip: https://repo1.maven.org/maven2/org/locationtech/rasterframes/pyrasterframes_2.11/${VERSION}/pyrasterframes_2.11-${VERSION}-python.zip
69-
* Assembly JAR:
70-
* The assembly JAR is embedded in the wheel file publised on pypi. Download the wheel from https://pypi.org/project/pyrasterframes/#files
71-
* The wheel file is just a [zip file with .whl extension](https://www.python.org/dev/peps/pep-0427/); you can extract the assembly JAR with a command like this: `unzip -j $PYRF_WHEEL $(zipinfo -1 $PYRF_WHEEL | grep jar)`
72-
73-
74-
#### Shell
75-
76-
The `pyspark` shell command will look something like this, replacing the `--jars` argument with the assembly jar and the `--py-files` with the source zip (not the wheel). To submit a script, add a .py file as the final argument
66+
The `pyspark` shell command will look something like this.
7767

7868
```bash
7969
pyspark \
8070
--master local[*] \
81-
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
71+
--py-files pyrasterframes_2.11-${VERSION}-python.zip \
72+
--packages org.locationtech.rasterframes:rasterframes_2.11:${VERSION},org.locationtech.rasterframes:pyrasterframes_2.11:${VERSION},org.locationtech.rasterframes:rasterframes-datasource_2.11:${VERSION}
73+
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ # these configs improve serialization performance
8274
--conf spark.kryo.registrator=org.locationtech.rasterframes.util.RFKryoRegistrator \
83-
--conf spark.kryoserializer.buffer.max=500m \
84-
--jars pyrasterframes/target/scala-2.11/pyrasterframes-assembly-${VERSION}.jar \
85-
--py-files pyrasterframes/target/scala-2.11/pyrasterframes-python-${VERSION}.zip
75+
--conf spark.kryoserializer.buffer.max=500m
8676
```
8777

8878
Then in the `pyspark` shell, import the module and call `withRasterFrames` on the SparkSession.
@@ -104,33 +94,6 @@ SparkSession available as 'spark'.
10494

10595
Now you have the configured SparkSession with RasterFrames enabled.
10696

107-
#### Submitting Apps
108-
109-
Prepare the call to `spark-submit` in much the same way as using the `pyspark` shell. In the python script you submit, you will use the SparkSession builder pattern and add some RasterFrames extras to it. You have more flexibility in setting up configurations in either your script or in the `spark-submit` call.
110-
111-
```python, evaluate=False
112-
# contents of app.py
113-
114-
from pyspark.sql import SparkSession
115-
import pyrasterframes
116-
spark = (SparkSession.builder
117-
.appName("My RasterFrames app")
118-
.config('spark.some_config', some_val) # app configurations
119-
.withKryoSerialization() # sets spark.serializer and spark.kryo configs
120-
.getOrCreate()).withRasterFrames()
121-
df = spark.read.raster('...')
122-
```
123-
124-
To submit, use a call like:
125-
126-
```bash
127-
$ spark-submit \
128-
--master spark://sparkmaster:7077 \
129-
--jars pyrasterframes/target/scala-2.11/pyrasterframes-assembly-${VERSION}.jar \
130-
--py-files pyrasterframes/target/scala-2.11/pyrasterframes-python-${VERSION}.zip \
131-
app.py
132-
```
133-
13497
## Installing GDAL
13598

13699
GDAL provides a wide variety of drivers to read data from many different raster formats. If GDAL is installed in the environment, RasterFrames will be able to @ref:[read](raster-read.md) those formats. If you are using the @ref:[Jupyter Notebook image](getting-started.md#jupyter-notebook), GDAL is already installed for you. Otherwise follow the instructions below.

rf-notebook/src/main/notebooks/Getting Started.ipynb

Lines changed: 66 additions & 68 deletions
Large diffs are not rendered by default.

rf-notebook/src/main/notebooks/pretty_rendering_rf_types.tile.ipynb

Lines changed: 101 additions & 790 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)