Skip to content

Commit 3bbdae4

Browse files
committed
Write singleband for example, fix path to uri in writing
Signed-off-by: Jason T. Brown <[email protected]>
1 parent 3a9e3e7 commit 3bbdae4

File tree

2 files changed

+22
-32
lines changed

2 files changed

+22
-32
lines changed

pyrasterframes/src/main/python/docs/raster-write.pymd

Lines changed: 12 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ However, there are times in any analysis where writing a representative sample o
88
import pyrasterframes
99
from pyrasterframes.rasterfunctions import *
1010
from IPython.display import display
11+
import os.path
12+
1113
spark = pyrasterframes.get_spark_session()
1214
```
1315

@@ -19,10 +21,10 @@ In an IPython or Jupyter interpreter, a `Tile` object will be displayed as an im
1921

2022
```python tile_sample
2123
def scene(band):
24+
b = str(band).zfill(2) # converts int 2 to '02'
2225
return 'https://modis-pds.s3.amazonaws.com/MCD43A4.006/11/08/2019059/' \
23-
'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(band)
24-
raster_url = scene('02')
25-
spark_df = spark.read.raster(raster_url)
26+
'MCD43A4.A2019059.h11v08.006.2019072203257_B{}.TIF'.format(b)
27+
spark_df = spark.read.raster(scene(2), tile_dimensions=(128, 128))
2628
tile = spark_df.select(rf_tile('proj_raster').alias('tile')).first()['tile']
2729
tile
2830
```
@@ -40,8 +42,7 @@ In the example below, notice the result is limited to a small subset. For more d
4042
```python to_pandas, evaluate=True
4143
import pyrasterframes.rf_ipython
4244

43-
pandas_df = spark.read.raster(raster_url, tile_dimensions=(64, 64)) \
44-
.select(
45+
pandas_df = spark_df.select(
4546
rf_extent('proj_raster').alias('extent'),
4647
rf_tile('proj_raster').alias('tile'),
4748
).limit(5).toPandas()
@@ -55,41 +56,32 @@ pandas_df
5556

5657
@@include[df-samples-output.md](static/df-samples-output.md)
5758

58-
5959
## GeoTIFFs
6060

6161
GeoTIFF is one of the most common file formats for spatial data, providing flexibility in data encoding, representation, and storage. RasterFrames provides a specialized Spark DataFrame writer for rendering a RasterFrame to a GeoTIFF.
6262

6363
One downside to GeoTIFF is that it is not a big data native format. To create a GeoTIFF all the data to be encoded has to be in the memory of one compute node (in Spark parlance, this is a "collect"), limiting it's maximum size substantially compared to that of a full cluster environment. When rendering GeoTIFFs in RasterFrames, you either need to specify the dimensions of the output raster, or be aware of how big the collected data will end up being.
6464

65-
Fortunately, we can use the cluster computing capability to downsample the data (using nearest-neighbor) into a more manageable size. For sake of example, let's render a simple RGB overview image of our scene as a small raster, reprojecting it to latitude and longitude coordinates on the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) reference ellipsoid (aka [EPSG:4326](https://spatialreference.org/ref/epsg/4326/)):
65+
Fortunately, we can use the cluster computing capability to downsample the data into a more manageable size. For sake of example, let's render an overview our scene's red band as a small raster, reprojecting it to latitude and longitude coordinates on the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) reference ellipsoid (aka [EPSG:4326](https://spatialreference.org/ref/epsg/4326/)).
6666

6767
```python write_geotiff
68-
import os.path
69-
from docs import docs_dir
70-
cat = """
71-
red,green,blue
72-
{},{},{}
73-
""".format(scene('01'), scene('04'), scene('03'))
74-
outfile = os.path.join(docs_dir(), 'geotiff-overview.tif')
75-
rf = spark.read.raster(catalog=cat, catalog_col_names=['red', 'green', 'blue'])
76-
rf.write.geotiff(outfile, crs='EPSG:4326', raster_dimensions=(256, 256))
68+
outfile = os.path.join('/tmp', 'geotiff-overview.tif')
69+
spark_df.write.geotiff('file://' + outfile, crs='EPSG:4326', raster_dimensions=(256, 256))
7770
```
7871

7972
View it with `rasterio` to check the results:
8073

8174
```python view_geotiff
8275
import rasterio
83-
import numpy as np
8476
from rasterio.plot import show, show_hist
8577

8678
with rasterio.open(outfile) as src:
87-
cells = np.clip(src.read(), 0, 1800).astype('float32')
88-
show(cells)
89-
show_hist(src, bins=50, lw=0.0, stacked=False, alpha=0.3,
79+
show(src, adjust='linear')
80+
show_hist(src, bins=50, lw=0.0, stacked=False, alpha=0.6,
9081
histtype='stepfilled', title="Overview Histogram")
9182
```
9283

84+
If there are many tile or projected raster columns in the DataFrame, the GeoTIFF writer will write each one as a separate band in the file. Each band in the output will be tagged the input column names for reference.
9385

9486
## GeoTrellis Layers
9587

0 commit comments

Comments
 (0)