You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-catalogs.pymd
+20-15Lines changed: 20 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
# Raster Catalogs
2
2
3
-
While much interesting processing can be done on a @ref:[single raster file](raster-read.md#single-raster), RasterFrames shines when _catalogs_ of raster data are to be processed. In its simplest form, a _catalog_ is a list of @ref:[URLs referencing raster files](raster-read.md#uri-formats). This list can be a Spark DataFrame, Pandas DataFrame, CSV file or CSV string. The _catalog_ is input into the `raster` DataSource, described in the @ref:[next page](raster-read.md), which creates _tiles_ from the rasters at the referenced URLs.
3
+
While interesting processing can be done on a @ref:[single raster file](raster-read.md#single-raster), RasterFrames shines when _catalogs_ of raster data are to be processed. In its simplest form, a _catalog_ is a list of @ref:[URLs referencing raster files](raster-read.md#uri-formats). This list can be a Spark DataFrame, Pandas DataFrame, CSV file or CSV string. The _catalog_ is input into the `raster` DataSource described in the @ref:[next page](raster-read.md), which creates _tiles_ from the rasters at the referenced URLs.
4
4
5
5
A _catalog_ can have one or two dimensions:
6
6
7
7
* One-D: A single column contains raster URLs across the rows. All referenced rasters represent the same @ref:[band](concepts.md#band). For example, a column of URLs to Landsat 8 near-infrared rasters covering Europe. Each row represents different places and times.
8
-
* Two-D: Many columns containing raster URLs. Each column references the same band, and each row represents the same place and time. For example, red-, green-, and blue-band columns for scenes covering Europe. Each row represents a single @ref:[scene](concepts.md#scene) with the same resolution, extent, [_CRS_][CRS], etc across the row.
8
+
* Two-D: Many columns contain raster URLs. Each column references the same band, and each row represents the same place and time. For example, red-, green-, and blue-band columns for scenes covering Europe. Each row represents a single @ref:[scene](concepts.md#scene) with the same resolution, extent, [_CRS_][CRS], etc across the row.
9
9
10
10
## Creating a Catalog
11
11
12
-
This section will provide some examples of creating your own _catalogs_, as well as introduce some experimental _catalogs_ built into RasterFrames. Reading raster data represented by a _catalog_ is covered in more detail in the @ref:[next page](raster-read.md).
12
+
This section will provide some examples of _catalogs_ creation, as well as introduce some experimental _catalogs_ built into RasterFrames. Reading raster data represented by a _catalog_ is covered in more detail in the @ref:[next page](raster-read.md).
13
13
14
14
```python, setup, echo=False
15
15
from pyrasterframes.utils import create_rf_spark_session
Example of a multiple columns representing multiple content types (bands) across multiple scenes. In each row, the scene is the same: granule id `h04v09` on July 4 or July 7, 2018. The first column is band 1, red, and the second is band 2, near infrared.
57
+
In this example, multiple columns representing multiple content types (bands) across multiple scenes. In each row, the scene is the same: granule id `h04v09` on July 4 or July 7, 2018. The first column is band 1, red, and the second is band 2, near infrared.
The concept of a _catalog_ is much more powerful when we consider examples beyond constructing the DataFrame, and instead read the data from an external source. Here's an extended example of reading an cloud-hosted CSV file containing MODIS scene metadata and transforming it into a _catalog_. The metadata describing the content of each URL is an important aspect of processing raster data.
83
+
The concept of a _catalog_ is much more powerful when we consider examples beyond constructing the DataFrame, and instead read the data from an external source. Here's an extended example of reading a cloud-hosted CSV file containing MODIS scene metadata and transforming it into a _catalog_. The metadata describing the content of each URL is an important aspect of processing raster data.
RasterFrames comes with two experimental catalogs over the AWS PDS [Landsat 8][Landsat] and [MODIS][MODIS] repositories. They are created by downloading the latest scene lists and building up the appropriate band URI columns as in the prior example.
109
114
110
-
> Note: The first time you run these may take some time, as the catalogs are large. However, they are cached and subsequent invocations should be faster.
115
+
> Note: The first time you run these may take some time, as the catalogs are large and have to be downloaded. However, they are cached and subsequent invocations should be faster.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-io.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,11 +11,13 @@ The standard mechanism by which any data is brought in and out of a Spark Datafr
11
11
-`geotiff`: a simplified reader for reading a single GeoTIFF file
12
12
-`geotrellis`: for reading a [GeoTrellis layer][GTLayer]
13
13
*@ref:[Raster Writers](raster-write.md)
14
-
- You can write @ref:[Tile](raster-write.md#tile-samples) and @ref:[DataFrame](raster-write.md#dataframe-samples) samples
15
14
-@ref:[`geotiff`](raster-write.md#geotiffs): beta writer to GeoTiff file format
16
15
-@ref:[`geotrellis`](raster-write.md#geotrellis-layers): creating a [GeoTrellis layer][GTLayer]
17
16
-@ref:[`parquet`](raster-write.md#parquet): general purpose writer for [Parquet][Parquet]
18
17
18
+
19
+
Furthermore, when in a Jupyter Notebook environment, you can view @ref:[Tile](raster-write.md#tile-samples) and @ref:[DataFrame](raster-write.md#dataframe-samples) samples.
20
+
19
21
There is also support for @ref:[vector data](vector-data.md) for masking and data labeling.
0 commit comments