Skip to content

Commit 06889a0

Browse files
committed
Merge branch 'feature/courtney-edits' of github.com:s22s/rasterframes into feature/courtney-edits
2 parents 59d10b8 + 0e06125 commit 06889a0

File tree

2 files changed

+15
-15
lines changed

2 files changed

+15
-15
lines changed

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ In the initial examples on this page, we used @ref:[`rf_tile`](reference.md#rf-t
146146

147147
## Multiband Rasters
148148

149-
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different bands. The bands may represent intensity of different wavelengths of light (or other electromagnetic radiation). The different bands may represent other phenomena such as measurement time, quality indications, or additional measurements.
149+
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different bands. The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
150150

151151
When reading a multiband raster or a _Catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, indexed from zero, passing a list of integers into the `band_indexes` parameter of the `raster` reader.
152152

@@ -182,4 +182,4 @@ mb2.printSchema()
182182

183183
[CRS]: concepts.md#coordinate-reference-system--crs
184184
[Extent]: concepts.md#extent
185-
[Tile]: concepts.md#tile
185+
[Tile]: concepts.md#tile

pyrasterframes/src/main/python/docs/raster-write.pymd

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Writing Raster Data
22

3-
RasterFrames is oriented toward large scale analyses of spatial data. The primary output for most use cases may be a @ref:[statistical summary](aggregation.md), a @ref:[machine learning model](machine-learning.md), or some other result that is generally much smaller than the input data set.
3+
RasterFrames is oriented toward large scale analyses of spatial data. The primary output of these analyses could be a @ref:[statistical summary](aggregation.md), a @ref:[machine learning model](machine-learning.md), or some other result that is generally much smaller than the input data set.
44

5-
However there are times in any analysis where writing a representative sample of the work in progress provides invaluable feedback on the process and results.
5+
However, there are times in any analysis where writing a representative sample of the work in progress provides invaluable feedback on the current state of the process and results.
66

77
```python imports, echo=False
88
import pyrasterframes
@@ -13,12 +13,12 @@ spark = pyrasterframes.get_spark_session()
1313

1414
## Tile Samples
1515

16-
When collecting a _tile_ (see discussion of the RasterFrame @ref:[schema](raster-read.md#single-raster) for orientation to the concept) to the Python Spark driver, we have some convenience methods to quickly visualize the _tile_.
16+
When collecting a _tile_ (see discussion of the RasterFrame @ref:[schema](raster-read.md#single-raster) for orientation to the concept) to the Python Spark driver, we have some convenience methods to quickly visualize the _tile_.
1717

18-
In an IPython or Jupyter interpreter a `Tile` object will be displayed as an image with limited metadata.
18+
In an IPython or Jupyter interpreter, a `Tile` object will be displayed as an image with limited metadata.
1919

2020
```python tile_sample
21-
def scene(band):
21+
def scene(band):
2222
return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
2323
'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(band)
2424
raster_url = scene('02')
@@ -33,7 +33,7 @@ display(tile) # IPython.display function
3333

3434
## DataFrame Samples
3535

36-
Within an IPython or Jupyter interpreter a Pandas DataFrame containing a column of _tiles_ will be rendered as the samples discussed above. Simply import the `rf_ipython` submodule to enable enhanced HTML rendering of a Pandas DataFrame.
36+
Within an IPython or Jupyter interpreter, a Pandas DataFrame containing a column of _tiles_ will be rendered as the samples discussed above. Simply import the `rf_ipython` submodule to enable enhanced HTML rendering of a Pandas DataFrame.
3737

3838
In the example below, notice the result is limited to a small subset. For more discussion about why this is important, see the @ref:[Pandas and NumPy discussion](numpy-pandas.md).
3939

@@ -48,7 +48,7 @@ pandas_df = spark.read.raster(raster_url, tile_dimensions=(64, 64)) \
4848
pandas_df.dtypes
4949
```
5050

51-
Viewing the DataFrame in Jupyter looks like this.
51+
Viewing the DataFrame in Jupyter looks like this.
5252

5353
```python show_pandas, evaluate=False
5454
pandas_df
@@ -59,16 +59,16 @@ pandas_df
5959

6060
## GeoTIFFs
6161

62-
GeoTIFF is one of the most common file formats for spatial data, providing flexibility in data encoding, representation, and storage. RasterFrames provides a specialized Spark DataFrame writer for rendering a RasterFrame to a GeoTiff.
62+
GeoTIFF is one of the most common file formats for spatial data, providing flexibility in data encoding, representation, and storage. RasterFrames provides a specialized Spark DataFrame writer for rendering a RasterFrame to a GeoTIFF.
6363

6464
One downside to GeoTIFF is that it is not a big data native format. To create a GeoTIFF all the data to be encoded has to be in the memory of one compute node (in Spark parlance, this is a "collect"), limiting it's maximum size substantially compared to that of a full cluster environment. When rendering GeoTIFFs in RasterFrames, you either need to specify the dimensions of the output raster, or be aware of how big the collected data will end up being.
6565

66-
Fortunately, we can use the cluster computing capability to downlample the data (using nearest-neighbor) into a more manageble size. For sake of example, let's render a simple RGB overview image of our scene as a small raster, reprojecting it to latitude and longitude coordinates on the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) reference ellipsoid (aka [EPSG:4326](https://spatialreference.org/ref/epsg/4326/)):
66+
Fortunately, we can use the cluster computing capability to downsample the data (using nearest-neighbor) into a more manageable size. For sake of example, let's render a simple RGB overview image of our scene as a small raster, reprojecting it to latitude and longitude coordinates on the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) reference ellipsoid (aka [EPSG:4326](https://spatialreference.org/ref/epsg/4326/)):
6767

6868
```python write_geotiff
6969
import os.path
7070
from docs import docs_dir
71-
cat = """
71+
cat = """
7272
red,green,blue
7373
{},{},{}
7474
""".format(scene('01'), scene('04'), scene('03'))
@@ -83,7 +83,7 @@ View it with `rasterio` to check the results:
8383
import rasterio
8484
import numpy as np
8585
from rasterio.plot import show, show_hist
86-
86+
8787
with rasterio.open(outfile) as src:
8888
cells = np.clip(src.read(), 0, 1800).astype('float32')
8989
show(cells)
@@ -100,12 +100,12 @@ with rasterio.open(outfile) as src:
100100

101101
## Parquet
102102

103-
You can write the Spark DataFrame to an [Apache Parquet][Parquet] "file". This format is designed to work across different projects in the Hadoop ecosystem. It also provides a variety of optimizations for query against data written in the format.
103+
You can write the Spark DataFrame to an [Apache Parquet][Parquet] "file". This format is designed to work across different projects in the Hadoop ecosystem. It also provides a variety of optimizations for query against data written in the format.
104104

105105
```python write_parquet, evaluate=False
106106
spark_df.withColumn('exp', rf_expm1('proj_raster')) \
107107
.write.mode('append').parquet('hdfs:///rf-user/sample.pq')
108108
```
109109

110110
[GeoTrellis]: https://geotrellis.readthedocs.io/en/latest/
111-
[Parquet]: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
111+
[Parquet]: https://spark.apache.org/docs/latest/sql-data-sources-parquet.html

0 commit comments

Comments
 (0)