Skip to content

Commit 66acf9e

Browse files
committed
Update docs raster-read to discuss geotrellis
Signed-off-by: Jason T. Brown <[email protected]>
1 parent 76e494d commit 66acf9e

File tree

2 files changed

+41
-16
lines changed

2 files changed

+41
-16
lines changed

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 40 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,13 @@ from pyrasterframes.rasterfunctions import *
88
spark = create_rf_spark_session()
99
```
1010

11-
RasterFrames registers a DataSource named `raster` that enables reading of GeoTIFFs (and other formats when @ref:[GDAL is installed](getting-started.md#installing-gdal)) from arbitrary URIs. In the examples that follow we'll be reading from a Sentinel-2 scene stored in an AWS S3 bucket.
11+
RasterFrames registers a DataSource named `raster` that enables reading of GeoTIFFs (and other formats when @ref:[GDAL is installed](getting-started.md#installing-gdal)) from arbitrary URIs. The `raster` DataSource operates on either a single raster file location or another DataFrame, called a _catalog_, containing pointers to many raster file locations.
12+
13+
RasterFrames can also read from @ref:[GeoTrellis catalogs and layers](raster-read.md#geotrellis).
1214

1315
## Single Raster
1416

15-
The simplest form is reading a single raster from a single URI.
17+
The simplest way to use the `raster` reader is with a single raster from a single URI or file. In the examples that follow we'll be reading from a Sentinel-2 scene stored in an AWS S3 bucket.
1618

1719
```python read_one_uri
1820
rf = spark.read.raster('https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif')
@@ -56,19 +58,19 @@ RasterFrames relies on three different IO drivers, selected based on a combinati
5658

5759
| Prefix | GDAL | Java I/O | Hadoop |
5860
| ------------------- | ----------- | -------- | -------- |
59-
| `gdal://<vsidrv>//` | + | - | - |
60-
| `file://` | + | + | - |
61-
| `http://` | + | + | - |
62-
| `https://` | + | + | - |
63-
| `ftp://` | `/vsicurl/` | + | - |
64-
| `hdfs://` | `/vsihdfs/` |- | + |
65-
| `s3://` | `/vsis3/` | + | - |
66-
| `s3n://` | - | - | + |
67-
| `s3a://` | - | - | + |
68-
| `wasb://` | `/vsiaz/` | - | + |
69-
| `wasbs://` | - | - | + |
61+
| `gdal://<vsidrv>//` | + | - | - |
62+
| `file://` | + | + | - |
63+
| `http://` | + | + | - |
64+
| `https://` | + | + | - |
65+
| `ftp://` | `/vsicurl/` | + | - |
66+
| `hdfs://` | `/vsihdfs/` | - | + |
67+
| `s3://` | `/vsis3/` | + | - |
68+
| `s3n://` | - | - | + |
69+
| `s3a://` | - | - | + |
70+
| `wasb://` | `/vsiaz/` | - | + |
71+
| `wasbs://` | - | - | + |
7072

71-
Specific [GDAL Virtual File System drivers](https://gdal.org/user/virtual_file_systems.html) can be selected using the `gdal://<vsidrv>//` syntax. For example If you have a `archive.zip` file containing a GeoTiff named `my-file-inside.tif`, you can address it with `gdal://vsizip//path/to/archive.zip/my-file-inside.tif`. See the GDAL documentation for the format of the URIs after the `gdal:/` prefix (which is stripped off before passing the rest of the path to GDAL).
73+
Specific [GDAL Virtual File System drivers](https://gdal.org/user/virtual_file_systems.html) can be selected using the `gdal://<vsidrv>//` syntax. For example If you have a `archive.zip` file containing a GeoTiff named `my-file-inside.tif`, you can address it with `gdal://vsizip//path/to/archive.zip/my-file-inside.tif`. Another example would be a MRF file in an S3 bucket on AWS: `gdal://vsis3/my-bucket/prefix/to/raster.mrf`. See the GDAL documentation for the format of the URIs after the `gdal:/` scheme. The `gdal:/` scheme is stripped off before passing the rest of the path to GDAL.
7274

7375

7476
## Raster Catalogs
@@ -127,7 +129,7 @@ Observe that the schema of the resulting DataFrame has a projected raster struct
127129
rf.select('gid', rf_extent('red'), rf_extent('nir'), rf_tile('red'), rf_tile('nir')).show(3, False)
128130
```
129131

130-
### Lazy Raster Reads
132+
## Lazy Raster Reads
131133

132134
By default the raster reads are delayed as long as possible. The DataFrame will contain metadata and pointers to the appropriate portion of the data until
133135

@@ -180,6 +182,29 @@ mb2 = spark.read.raster(catalog=spark.createDataFrame(mb_cat),
180182
mb2.printSchema()
181183
```
182184

185+
## GeoTrellis
186+
187+
### GeoTrellis Catalogs
188+
189+
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. RasterFrames provides a DataSource that supports both reading and @ref:[writing](raster-write.md#geotrellis-layers) with GeoTrellis.
190+
191+
A GeoTrellis catalog is a set of GeoTrellis layers. We can read a dataframe giving details of the content of a catalog using the following. The scheme is typically `hdfs` or a cloud storage provider like `s3` or `wasb`.
192+
193+
```python, evaluate=False
194+
gt_cat = spark.read.format('geotrellis-catalog').load('scheme://path-to-gt-catalog')
195+
```
196+
197+
### GeoTrellis Layers
198+
199+
The catalog will give details on the particular layers available for query. We can read the layer with the same URI to the catalog, the layer name, and the desired zoom level.
200+
201+
```python, evaluate=False
202+
gt_layer = spark.read.geotrellis(path='scheme://path-to-gt-catalog', layer=layer_name, zoom=zoom_level)
203+
```
204+
205+
This will return a RasterFrame with additional metadata inherited from the GeoTrellis TileLayerMetadata, such as the SpatialKey. The TileLayerMetadata is also stored as json in the metadata of the tile column.
206+
183207
[CRS]: concepts.md#coordinate-reference-system--crs
184208
[Extent]: concepts.md#extent
185209
[Tile]: concepts.md#tile
210+
[GeoTrellis]: https://geotrellis.readthedocs.io/en/latest/

pyrasterframes/src/main/python/docs/raster-write.pymd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ If there are many tile or projected raster columns in the DataFrame, the GeoTIFF
8585

8686
## GeoTrellis Layers
8787

88-
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. Ingesting raster data into a Layer is one of the key concepts for creating a dataset for processing on Spark. RasterFrames write data from an appropriate DataFrame into a [GeoTrellis Layer](https://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html). RasterFrames provides a `geotrellis` DataSource that supports both reading and writing of GeoTrellis layers.
88+
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. Ingesting raster data into a Layer is one of the key concepts for creating a dataset for processing on Spark. RasterFrames writes data from an appropriate DataFrame into a [GeoTrellis Layer](https://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html). RasterFrames provides a `geotrellis` DataSource that supports both @ref:[reading](raster-read.md#geotrellis) and writing of GeoTrellis layers.
8989

9090
> An example is forthcoming.
9191

0 commit comments

Comments
 (0)