You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-read.pymd
+47-20Lines changed: 47 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,13 @@ from pyrasterframes.rasterfunctions import *
8
8
spark = create_rf_spark_session()
9
9
```
10
10
11
-
RasterFrames registers a DataSource named `raster` that enables reading of GeoTIFFs (and other formats when @ref:[GDAL is installed](getting-started.md#installing-gdal)) from arbitrary URIs. In the examples that follow we'll be reading from a Sentinel-2 scene stored in an AWS S3 bucket.
11
+
RasterFrames registers a DataSource named `raster` that enables reading of GeoTIFFs (and other formats when @ref:[GDAL is installed](getting-started.md#installing-gdal)) from arbitrary URIs. The `raster` DataSource operates on either a single raster file location or another DataFrame, called a _catalog_, containing pointers to many raster file locations.
12
+
13
+
RasterFrames can also read from @ref:[GeoTrellis catalogs and layers](raster-read.md#geotrellis).
12
14
13
15
## Single Raster
14
16
15
-
The simplest form is reading a single raster from a single URI.
17
+
The simplest way to use the `raster` reader is with a single raster from a single URI or file. In the examples that follow we'll be reading from a Sentinel-2 scene stored in an AWS S3 bucket.
Specific [GDAL Virtual File System drivers](https://gdal.org/user/virtual_file_systems.html) can be selected using the `gdal://<vsidrv>//` syntax. For example If you have a `archive.zip` file containing a GeoTiff named `my-file-inside.tif`, you can address it with `gdal://vsizip//path/to/archive.zip/my-file-inside.tif`. See the GDAL documentation for the format of the URIs after the `gdal:/` prefix (which is stripped off before passing the rest of the path to GDAL).
73
+
Specific [GDAL Virtual File System drivers](https://gdal.org/user/virtual_file_systems.html) can be selected using the `gdal://<vsidrv>//` syntax. For example If you have a `archive.zip` file containing a GeoTiff named `my-file-inside.tif`, you can address it with `gdal://vsizip//path/to/archive.zip/my-file-inside.tif`. Another example would be a MRF file in an S3 bucket on AWS: `gdal://vsis3/my-bucket/prefix/to/raster.mrf`. See the GDAL documentation for the format of the URIs after the `gdal:/` scheme. The `gdal:/` scheme is stripped off before passing the rest of the path to GDAL.
72
74
73
75
74
76
## Raster Catalogs
@@ -127,11 +129,11 @@ Observe that the schema of the resulting DataFrame has a projected raster struct
By default the raster reads are delayed as long as possible. The DataFrame will contain metadata and pointers to the appropriate portion of the data until
134
+
By default the raster reads are delayed as long as possible. The DataFrame will contain metadata and pointers to the appropriate portion of the data until reading of the source raster data is absolutely necessary. This can save a lot of computation and I/O time for two reasons. One is that a _catalog_ may contain millions of rows. Second is that the `raster` DataSource attempts to ensure filters are processed before reading raster data.
133
135
134
-
Consider the following two reads of the same data source. In the first, the lazy case, there is a pointer to the URI, extent and band to read. This will not be evaluated until the cell values are absolutely required. The second case shows the realized tile is queried right away.
136
+
Consider the following two reads of the same data source. In the first, the lazy case, there is a pointer to the URI, extent and band to read. This will not be evaluated until the cell values are absolutely required. The second case shows the option to force the raster to be fully read right away.
135
137
136
138
```python lazy_demo
137
139
uri = 'https://s22s-test-geotiffs.s3.amazonaws.com/luray_snp/B02.tif'
In the initial examples on this page, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized tile from the lazy representation.
147
+
In the initial examples on this page, you may have noticed that the realized (non-lazy) tiles are shown, but we did not change `lazy_tiles`. Instead, we used @ref:[`rf_tile`](reference.md#rf-tile) to explicitly request the realized tile from the lazy representation.
146
148
147
149
## Multiband Rasters
148
150
149
151
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different @ref:[bands](concepts.md#band). The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
150
152
151
-
When reading a multiband raster or a _Catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, indexed from zero, passing a list of integers into the `band_indexes` parameter of the `raster` reader.
153
+
Multiband rasters files have a strictly ordered set of bands, which are typically indexed from 1. Some files have metadata tags associated with each band. Some files have a color interpetation metadata tag indicating how to interpret the bands.
154
+
155
+
When reading a multiband raster or a _catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, **indexed from zero**, as a list of integers into the `band_indexes` parameter of the `raster` reader.
152
156
153
-
For example we can read a four-band (red, green, blue, and near-infrared) image as follows. The individual rows of the resulting DataFrame still represent distinct spatial extents, with a projected raster column for each band specified by `band_indexes`.
157
+
For example, we can read a four-band (red, green, blue, and near-infrared) image as follows. The individual rows of the resulting DataFrame still represent distinct spatial extents, with a projected raster column for each band specified by `band_indexes`.
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. RasterFrames provides a DataSource that supports both reading and @ref:[writing](raster-write.md#geotrellis-layers) with GeoTrellis.
192
+
193
+
A GeoTrellis catalog is a set of GeoTrellis layers. We can read a dataframe giving details of the content of a catalog using the following. The scheme is typically `hdfs` or a cloud storage provider like `s3` or `wasb`.
The catalog will give details on the particular layers available for query. We can read the layer with the same URI to the catalog, the layer name, and the desired zoom level.
This will return a RasterFrame with additional metadata inherited from the GeoTrellis TileLayerMetadata, such as the SpatialKey. The TileLayerMetadata is also stored as json in the metadata of the tile column.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-write.pymd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -89,7 +89,7 @@ os.remove(outfile)
89
89
90
90
## GeoTrellis Layers
91
91
92
-
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. Ingesting raster data into a Layer is one of the key concepts for creating a dataset for processing on Spark. RasterFrames write data from an appropriate DataFrame into a [GeoTrellis Layer](https://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html). RasterFrames provides a `geotrellis` DataSource that supports both reading and writing of GeoTrellis layers.
92
+
[GeoTrellis][GeoTrellis] is one of the key libraries that RasterFrames builds upon. It provides a Scala language API to working with large raster data with Apache Spark. Ingesting raster data into a Layer is one of the key concepts for creating a dataset for processing on Spark. RasterFrames writes data from an appropriate DataFrame into a [GeoTrellis Layer](https://geotrellis.readthedocs.io/en/latest/guide/tile-backends.html). RasterFrames provides a `geotrellis` DataSource that supports both @ref:[reading](raster-read.md#geotrellis) and writing of GeoTrellis layers.
0 commit comments