You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/concepts.md
+7-3Lines changed: 7 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,6 +45,10 @@ A "NoData" (or N/A) value is a specifically identified value for a cell type use
45
45
46
46
A scene (or granule) is a discrete instance of EO @ref:[raster data](concepts.md#raster) with a specific extent (region), date-time, and map projection (or CRS).
47
47
48
+
## Band
49
+
50
+
A @ref:[scene](concepts.md#scene) frequently defines many different measurements captured a the same date-time, over the same extent, and meant to be processed together. These different measurements are referred to as bands. The name comes from the varying bandwidths of light and electromagnetic radiation measured in many EO datasets.
51
+
48
52
## Coordinate Reference System (CRS)
49
53
50
54
A [coordinate reference system (or spatial reference system)][CRS] is a set of mathematical constructs used to translate locations on the three-dimensional surface of the earth to the two dimensional raster grid. A CRS typically accompanies any EO data so it can be precisely located.
@@ -57,10 +61,10 @@ An extent (or bounding box) is a rectangular region specifying the geospatial co
57
61
58
62
A tile (sometimes called a "chip") is a rectangular subset of a @ref:[scene](concepts.md#scene). As a scene is a raster, a tile is also a raster. A tile can conceptually be thought of as a two-dimensional array.
59
63
60
-
Some EO data has many bands or channels. Tiles in this context are conceptually a three-dimensional array, with the extra dimension representing the bands.
64
+
Some EO data has many @ref:[bands](concepts.md#band) or channels. Within RasterFrames, this third dimension is handled across columns of the DataFrame, such that the tiles within DataFrames are all two-dimensional arrays.
61
65
62
66
Tiles are often square and the dimensions are some power of two, for example 256 by 256.
63
67
64
-
The tile is the primary discretization unit used in RasterFrames. Each band of a scene is in a separate column. The scene's overall @ref:[extent](concepts.md#extent) is carved up into smaller extents for each tile. Each row of the DataFrame contains a two-dimensional tile per band column.
68
+
The tile is the primary discretization unit used in RasterFrames. The scene's overall @ref:[extent](concepts.md#extent) is carved up into smaller extents and spread across rows.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-catalogs.pymd
+37-29Lines changed: 37 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,15 @@
1
1
# Raster Catalogs
2
2
3
-
While much interesting processing can be done on a @ref:[single raster file](raster-read.md#single-raster), RasterFrames shines when _Catalogs_ of raster data are to be processed. In its simplest form, a _Catalog_ is a listing of @ref:[URLs referencing raster files](raster-read.md#uri-formats). This listing can be manifested as a DataFrame, CSV file or CSV string. The _Catalog_ is input into the `raster` DataSource, described in the next page.
3
+
While much interesting processing can be done on a @ref:[single raster file](raster-read.md#single-raster), RasterFrames shines when _catalogs_ of raster data are to be processed. In its simplest form, a _catalog_ is a list of @ref:[URLs referencing raster files](raster-read.md#uri-formats). This list can be a Spark DataFrame, Pandas DataFrame, CSV file or CSV string. The _catalog_ is input into the `raster` DataSource, described in the @ref:[next page](raster-read.pymd), which creates tiles from the rasters at the referenced URLs.
4
4
5
-
A _Catalog_ can have one or two dimensions:
5
+
A _catalog_ can have one or two dimensions:
6
6
7
-
* One-D: A single column containing one or many URLs across the rows. All referenced rasters represent the same content type. For example, a column of URLs to Landsat 8 NIR rasters covering Europe. Each row represents different places and times.
8
-
* Two-D: Many columns containing raster URLs. Each column references the same content type, and each row represents the same place and time. For example, red-, green-, and blue-band columns for scenes covering Europe. Each row represents a single spatiotemporal location (or scene) with the same dimensions, extent, [_CRS_][CRS], etc across the row.
7
+
* One-D: A single column contains raster URLs across the rows. All referenced rasters represent the same @ref:[band](concepts.md#band). For example, a column of URLs to Landsat 8 near-infrared rasters covering Europe. Each row represents different places and times.
8
+
* Two-D: Many columns containing raster URLs. Each column references the same band, and each row represents the same place and time. For example, red-, green-, and blue-band columns for scenes covering Europe. Each row represents a single @ref:[scene](concepts.md#scene) with the same resolution, extent, [_CRS_][CRS], etc across the row.
9
9
10
10
## Creating a Catalog
11
11
12
-
This section will provide some examples of creating your own _Catalogs_, as well as introduce some experimental _Catalogs_ built into RasterFrames. Reading raster data represented by a _Catalog_ is covered in more detail in the @ref:[next page](raster-read.md).
12
+
This section will provide some examples of creating your own _catalogs_, as well as introduce some experimental _catalogs_ built into RasterFrames. Reading raster data represented by a _catalog_ is covered in more detail in the @ref:[next page](raster-read.md).
13
13
14
14
```python, echo=False
15
15
from IPython.display import display
@@ -25,30 +25,37 @@ A single URL is the simplest form of a catalog.
A single column represents the same content type with different observations along the rows. In this example it is band 1 of MODIS, which is visible red. In the example the location of the images is the same (h04v09) but the dates differ.
40
+
A single column represents the same content type with different observations along the rows. In this example it is band 1 of MODIS surface reflectance, which is visible red. In the example the location of the images is the same, indicated by the granule identifier `h04v09`, but the dates differ: 2018185 (July 4, 2018) and 2018188 (July 7, 2018).
Example of a multiple columns representing multiple content types (bands) across multiple scenes. In each row, the scene is the same. The first column is band 1 and the second is band 2, near infrared.
58
+
Example of a multiple columns representing multiple content types (bands) across multiple scenes. In each row, the scene is the same: granule id `h04v09` on July 4 or July 7, 2018. The first column is band 1, red, and the second is band 2, near infrared.
The simplest example of an external _Catalog_ is a DataFrame (or a transformation of) in one of the formats above. Here's an extended example of reading an external CSV file of MODIS scenes and transforming it into a _Catalog_. The metadata describing the content of each URL is an important aspect of processing raster data. This example includes some minimal metadata.
78
+
The concept of a _catalog_ is much more powerful when we consider examples beyond constructing the DataFrame, and instead read the data from an external source. Here's an extended example of reading an cloud-hosted CSV file containing MODIS scene metadata and transforming it into a _catalog_. The metadata describing the content of each URL is an important aspect of processing raster data.
Observe the scenes list file has URIs to `index.html` files in the download_url column. The image URI's are in the same directory. The filenames are of the form `${gid}_B${band}.TIF`. The next code chunk builds these URIs, which completes our catalog.
RasterFrames comes with two experimental catalogs over the AWS PDS [Landsat 8][Landsat] and [MODIS][MODIS] repositories. They are created by downloading the latest scene lists and transforming as in the prior example.
108
+
RasterFrames comes with two experimental catalogs over the AWS PDS [Landsat 8][Landsat] and [MODIS][MODIS] repositories. They are created by downloading the latest scene lists and building up the appropriate band URI columns as in the prior example.
101
109
102
110
> Note: The first time you run these may take some time, as the catalogs are large. However, they are cached and subsequent invocations should be faster.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/raster-read.pymd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -146,7 +146,7 @@ In the initial examples on this page, we used @ref:[`rf_tile`](reference.md#rf-t
146
146
147
147
## Multiband Rasters
148
148
149
-
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different bands. The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
149
+
A multiband raster represents a three dimensional numeric array. The first two dimensions are spatial, and the third dimension is typically designated as different @ref:[bands](concepts.md#band). The bands could represent intensity of different wavelengths of light (or other electromagnetic radiation), or they could represent other phenomena such as measurement time, quality indications, or additional measurements.
150
150
151
151
When reading a multiband raster or a _Catalog_ describing multiband rasters, you will need to know ahead of time which bands you want to read. You will specify the bands to read, indexed from zero, passing a list of integers into the `band_indexes` parameter of the `raster` reader.
0 commit comments