Skip to content

Commit 530ddcd

Browse files
committed
Update raster read docs page to fix #250 and #89
Signed-off-by: Jason T. Brown <[email protected]>
1 parent 66acf9e commit 530ddcd

File tree

2 files changed

+20
-18
lines changed

2 files changed

+20
-18
lines changed

pyrasterframes/src/main/python/docs/raster-read.pymd

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,17 @@ RasterFrames relies on three different IO drivers, selected based on a combinati
5858

5959
| Prefix | GDAL | Java I/O | Hadoop |
6060
| ------------------- | ----------- | -------- | -------- |
61-
| `gdal://<vsidrv>//` | + | - | - |
62-
| `file://` | + | + | - |
63-
| `http://` | + | + | - |
64-
| `https://` | + | + | - |
65-
| `ftp://` | `/vsicurl/` | + | - |
66-
| `hdfs://` | `/vsihdfs/` | - | + |
67-
| `s3://` | `/vsis3/` | + | - |
68-
| `s3n://` | - | - | + |
69-
| `s3a://` | - | - | + |
70-
| `wasb://` | `/vsiaz/` | - | + |
71-
| `wasbs://` | - | - | + |
61+
| `gdal://<vsidrv>//` | yes | no | no |
62+
| `file://` | yes | yes | no |
63+
| `http://` | yes | yes | no |
64+
| `https://` | yes | yes | no |
65+
| `ftp://` | `/vsicurl/` | yes | no |
66+
| `hdfs://` | `/vsihdfs/` | no | yes |
67+
| `s3://` | `/vsis3/` | yes | no |
68+
| `s3n://` | no | no | yes |
69+
| `s3a://` | no | no | yes |
70+
| `wasb://` | `/vsiaz/` | no | yes |
71+
| `wasbs://` | no | no | yes |
7272

7373
Specific [GDAL Virtual File System drivers](https://gdal.org/user/virtual_file_systems.html) can be selected using the `gdal://<vsidrv>//` syntax. For example If you have a `archive.zip` file containing a GeoTiff named `my-file-inside.tif`, you can address it with `gdal://vsizip//path/to/archive.zip/my-file-inside.tif`. Another example would be a MRF file in an S3 bucket on AWS: `gdal://vsis3/my-bucket/prefix/to/raster.mrf`. See the GDAL documentation for the format of the URIs after the `gdal:/` scheme. The `gdal:/` scheme is stripped off before passing the rest of the path to GDAL.
7474

@@ -131,9 +131,9 @@ rf.select('gid', rf_extent('red'), rf_extent('nir'), rf_tile('red'), rf_tile('ni
131131

132132
## Lazy Raster Reads
133133

134-
By default the raster reads are delayed as long as possible. The DataFrame will contain metadata and pointers to the appropriate portion of the data until
134+
By default the raster reads are delayed as long as possible. The DataFrame will contain metadata and pointers to the appropriate portion of the data until reading of the source raster data is absolutely necessary. This can save a lot of computation and I/O time for two reasons. One is that a _catalog_ may contain millions of rows. Second is that the `raster` DataSource attempts to ensure filters are processed before reading raster data.
135135

136-
Consider the following two reads of the same data source. In the first, the lazy case, there is a pointer to the URI, extent and band to read. This will not be evaluated until the cell values are absolutely required. The second case shows the realized tile is queried right away.
136+
Consider the following two reads of the same data source. In the first, the lazy case, there is a pointer to the URI, extent and band to read. This will not be evaluated until the cell values are absolutely required. The second case shows the option to force the raster to be fully read right away.
137137

138138
```python lazy_demo
139139
uri = 'https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif'
@@ -144,15 +144,17 @@ spark.read.raster(uri, lazy_tiles=False) \
144144
.select('proj_raster.tile').show(1, False)
145145
```
146146

147-
In the initial examples on this page, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized tile from the lazy representation.
147+
In the initial examples on this page, you may have noticed that the realized (non-lazy) tiles are shown, but we did not change `lazy_tiles`. Instead, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized tile from the lazy representation.
148148

149149
## Multiband Rasters
150150

151-
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different bands. The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
151+
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different bands. The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
152152

153-
When reading a multiband raster or a _Catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, indexed from zero, passing a list of integers into the `band_indexes` parameter of the `raster` reader.
153+
Multiband rasters files have a strictly ordered set of bands, which are typically indexed from 1. Some files have metadata tags associated with each band. Some files have a color interpetation metadata tag indicating how to interpret the bands.
154154

155-
For example we can read a four-band (red, green, blue, and near-infrared) image as follows. The individual rows of the resulting DataFrame still represent distinct spatial extents, with a projected raster column for each band specified by `band_indexes`.
155+
When reading a multiband raster or a _Catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, **indexed from zero**, passing a list of integers into the `band_indexes` parameter of the `raster` reader.
156+
157+
For example, we can read a four-band (red, green, blue, and near-infrared) image as follows. The individual rows of the resulting DataFrame still represent distinct spatial extents, with a projected raster column for each band specified by `band_indexes`.
156158

157159
```python Multiband
158160
mb = spark.read.raster('s3://s22s-test-geotiffs/naip/m_3807863_nw_17_1_20160620.tif',

pyrasterframes/src/main/python/docs/reference.pymd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Get the cell type of the `tile`. The cell type can be changed with @ref:[rf_conv
7676

7777
Tile rf_tile(ProjectedRasterTile proj_raster)
7878

79-
Get the `tile` from the `ProjectedRasterTile` or `RasterSource` type tile column.
79+
Get the fully realized (non-lazy) `tile` from the `ProjectedRasterTile` or `RasterSource` type tile column.
8080

8181
### rf_extent
8282

0 commit comments

Comments
 (0)