You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are a number of Earth-observation (EO) concepts that crop up in the discussion of RasterFrames features. We'll cover these briefly in the sections below. However, here are a few links providing a more extensive introduction to working with Earth observation data.
3
+
There are a number of Earth-observation (EO) concepts that crop up in the discussion of RasterFrames features. We'll cover these briefly in the sections below. However, here are a few links providing a more extensive introduction to working with Earth observation data.
4
4
5
5
*[_Fundamentals of Remote Sensing_](https://www.nrcan.gc.ca/maps-tools-and-publications/satellite-imagery-and-air-photos/tutorial-fundamentals-remote-sensing/9309)
*[_Earth Observation Markets and Applications_](https://www.ofcom.org.uk/__data/assets/pdf_file/0021/82047/introduction_eo_for_ofcom_june_2015_no_video.pdf)
8
8
9
+
## Raster
10
+
11
+
A raster is a regular grid of numeric values. A raster can be thought of as an image, as is the case if the values in the grid represent brightness along a greyscale. More generally a raster can measure many different phenomena or encode a variety of different discrete classifications.
12
+
9
13
## Cell
10
14
11
-
A cell is a single sample from a sensor encoded as a scalar value asssociated with a specific spatiotemporal location and time. It can be thought of as an image pixel associated with a place and time.
15
+
A cell is a single row and column intersection in the raster grid. It is a single pixel in an image. A cell's value often represents one sample from a sensor encoded as a scalar value associated with a specific location and time.
12
16
13
17
## Cell Type
14
18
@@ -18,7 +22,7 @@ A numeric cell value may be encoded in a number of different computer numeric fo
18
22
* integral vs floating-point
19
23
20
24
21
-
The cell types most frequent in RasterFrames are as follows:
25
+
The most frequently encountered cell types in RasterFrames are below.
22
26
23
27
| Name | Abbreviation | Description | Range |
24
28
| --- | --- | --- | --- |
@@ -31,38 +35,32 @@ The cell types most frequent in RasterFrames are as follows:
31
35
| Float |`float32`| 32-bit floating-point | -3.4028235E38 to 3.4028235E38 |
32
36
| Double |`float64`| 64-bit floating-point | -1.7976931348623157E308 to 1.7976931348623157E308 |
33
37
34
-
See the section on [“NoData” Handling](nodata-handling.md) for additional discussion on cell types.
38
+
See the section on [“NoData” Handling](nodata-handling.md) for additional discussion on cell types and more exhaustive coverage of available cell types.
35
39
36
40
## NoData
37
41
38
-
A "NoData" (or N/A) value is a specifically identified value for a cell type used to indicate the absence of data. See the section on @ref:[“NoData” Handling](nodata-handling.md) for additional discussion on NoData
42
+
A "NoData" (or N/A) value is a specifically identified value for a cell type used to indicate the absence of data. See the section on @ref:[“NoData” Handling](nodata-handling.md) for additional discussion on "NoData".
39
43
40
44
## Scene
41
45
42
-
A scene (or granule) is a discrete instance of EO data with a specific extent (region), date-time, and projection/CRS.
46
+
A scene (or granule) is a discrete instance of EO @ref:[raster data](concepts.md#raster) with a specific extent (region), date-time, and map projection (or CRS).
43
47
44
48
## Coordinate Reference System (CRS)
45
49
46
-
A coordinate reference system (or spatial reference system) is a set of mathematical constructs used to map cells to specific locations on the Earth (or other surface). A CRS typcially accompanies any EO data so it can be precicely located.
50
+
A [coordinate reference system (or spatial reference system)][CRS] is a set of mathematical constructs used to translate locations on the three-dimensional surface of the earth to the two dimensional raster grid. A CRS typically accompanies any EO data so it can be precisely located.
47
51
48
52
## Extent
49
53
50
-
An extent (or bounding box) is a rectangular region specifying the geospatial coverage of a two-dimensional array of cells in a singular CRS.
54
+
An extent (or bounding box) is a rectangular region specifying the geospatial coverage of a @ref:[raster](concepts.md#raster) or @ref:[tile](concepts.md#tile), a two-dimensional array of @ref:[cells](concepts.md#cell) within a single CRS.
51
55
52
56
## Tile
53
57
54
-
A tile (sometimes called a "chip") is a rectangular subset of a @ref:[scene](concepts.md#scene). A tile can conceptually be though of as a two-dimensional array.
58
+
A tile (sometimes called a "chip") is a rectangular subset of a @ref:[scene](concepts.md#scene). As a scene is a raster, a tile is also a raster. A tile can conceptually be thought of as a two-dimensional array.
55
59
56
60
Some EO data has many bands or channels. Tiles in this context are conceptually a three-dimensional array, with the extra dimension representing the bands.
57
61
58
-
Tiles are often square and the dimensions are some power of two, for example 256 by 256.
62
+
Tiles are often square and the dimensions are some power of two, for example 256 by 256.
59
63
60
64
The tile is the primary discretization unit used in RasterFrames. Each band of a scene is in a separate column. The scene's overall @ref:[extent](concepts.md#extent) is carved up into smaller extents for each tile. Each row of the DataFrame contains a two-dimensional tile per band column.
RasterFrames provides a DataFrame-centric view over arbitrary EO data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of Spark ML algorithms. It provides APIs in @ref:[Python, SQL, and Scala](languages.md), and can horizontally scale from a laptop to a supercomputer, enabling _global_ analysis with satellite imagery in a wholly new, flexible and convenient way.
3
+
RasterFrames® provides a DataFrame-centric view over arbitrary Earth-observation (EO) data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of [Apache Spark](https://spark.apache.org/docs/latest/)[ML](https://spark.apache.org/docs/latest/ml-guide.html) algorithms. It provides APIs in @ref:[Python, SQL, and Scala](languages.md), and can scale from a laptop to a large distributed cluster, enabling _global_ analysis with satellite imagery in a wholly new, flexible and convenient way.
4
4
5
5
## Context
6
6
7
-
We have a millennia-long history of organizing information in tabular form. Typically, rows represent independent events or observations, and columns represent measurements from the observations. The forms have evolved, from hand-written agricultural records and transaction ledgers, to the advent of spreadsheets on the personal computer, and on to the creation of the _DataFrame_ data structure as found in [R Data Frames][R] and [Python Pandas][Pandas]. The table-oriented data structure remains a common and critical component of organizing data across industries, and is the mental model employed by many data scientists across diverse forms of modeling and analysis.
7
+
We have a millennia-long history of organizing information in tabular form. Typically, rows represent independent events or observations, and columns represent attributes and measurements from the observations. The forms have evolved, from hand-written agricultural records and transaction ledgers, to the advent of spreadsheets on the personal computer, and on to the creation of the _DataFrame_ data structure as found in [R Data Frames][R] and [Python Pandas][Pandas]. The table-oriented data structure remains a common and critical component of organizing data across industries, and is the mental model employed by many data scientists across diverse forms of modeling and analysis.
8
8
9
-
Today, DataFrames are the _lingua franca_ of data science. The evolution of the tabular form has continued with Apache Spark SQL, which brings DataFrames to the big data distributed compute space. Through several novel innovations, Spark SQL enables interactive and batch-oriented cluster computing without having to be versed in the highly specialized skills typically required for high-performance computing. As suggested by the name, these DataFrames are manipulatable via standard SQL, as well as the more general-purpose programming languages Python, R, Java, and Scala.
9
+
The evolution of the DataFrame form has continued with [Spark SQL](https://spark.apache.org/docs/latest/sql-programming-guide.html), which brings DataFrames to the big data distributed compute space. Through several novel innovations, Spark SQL enables data scientists to work with DataFrames too large for the memory of a single computer. As suggested by the name, these DataFrames are manipulatable via standard SQL, as well as the more general-purpose programming languages Python, R, Java, and Scala.
10
10
11
-
RasterFrames®, an incubating Eclipse Foundation LocationTech project, brings together Earth-observing (EO) data analysis, big data computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity as well as a challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger. According to a World Bank document on assets for post-disaster situation awareness[^1]:
11
+
RasterFrames, an incubating Eclipse Foundation LocationTech project, brings together EO data access, cloud computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity as well as a challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger. According to a World Bank document on assets for post-disaster situation awareness[^1]:
12
12
13
13
> Of the 1,738 operational satellites currently orbiting the earth (as of 9/[20]17), 596 are earth observation satellites and 477 of these are non-military assets (ie available to civil society including commercial entities and governments for earth observation, according to the Union of Concerned Scientists). This number is expected to increase significantly over the next ten years. The 200 or so planned remote sensing satellites have a value of over 27 billion USD (Forecast International). This estimate does not include the burgeoning fleets of smallsats as well as micro, nano and even smaller satellites... All this enthusiasm has, not unexpectedly, led to a veritable fire-hose of remotely sensed data which is becoming difficult to navigate even for seasoned experts.
14
14
15
15
## Benefit
16
16
17
-
By using DataFrames as the core cognitive and compute data model for processing EO data, RasterFrames is able to deliver sophisticated computational and algorithmic capabilities in a tabular form that is familiar and accessible to the general computing public. Because it is built on Apache Spark, solutions prototyped on a laptop can be scaled to run on cluster and cloud compute resources in a way not easily achieved with other toolchains.
17
+
By using DataFrames as the core cognitive and compute data model for processing EO data, RasterFrames is able to deliver sophisticated computational and algorithmic capabilities in a tabular form that is familiar and accessible to the general computing public. Because it is built on Apache Spark, solutions prototyped on a laptop can be easily scaled to run on cluster and cloud compute resources. Apache Spark also provides integration between its DataFrame libraries and machine learning, with which RasterFrames is fully compatible.
18
18
19
19
## Architecture
20
20
21
-
RasterFrames takes the Spark SQL DataFrame and extends it to support standard EO operations. It does this with the help of several other LocationTech projects:
21
+
RasterFrames builds upon several other LocationTech projects:
RasterFrames introduces georectified raster imagery to Spark SQL. It quantizes scenes into chunks called "tiles". Each tile contains a 2-D matrix of "cell" (pixel) values along with information on how to numerically interpret those cells. As shown in the figure below, a "RasterFrame" is a Spark DataFrame with one or more columns of type `tile`. A `tile` column typically represents a single frequency band of sensor data, such as "blue" or "near infrared", but can also be quality assurance information, land classification assignments, or any other rasterized spatiotemporal data. Along with `tile` columns there is typically an `extent` specifying the geographic location of the data, the map projection of that geometry (`crs`), and a `timestamp` column representing the acquisition time. These columns can all be used in the `WHERE` clause when filtering
28
+
RasterFrames introduces georectified raster imagery to Spark SQL. It quantizes scenes into chunks called @ref:[_tiles_](concepts.md#tile). Each tile contains a 2-D matrix of @ref:[_cell_](concepts.md#tile) or pixel values along with information on how to numerically interpret those cells.
29
29
30
-
RasterFrames also includes support for working with vector data, such as [GeoJSON][GeoJSON]. You can use vector data to filter DataFrame rows, using geospatial predicates (e.g. contains, intersects, overlaps, etc.), to mask cells, and to be rasterzied into training data appropriate for machine learning.
30
+
As shown in the figure below, a "RasterFrame" is a Spark DataFrame with one or more columns of type @ref:[`tile`](concepts.md#tile). A `tile` column typically represents a single frequency band of sensor data, such as "blue" or "near infrared", but can also be quality assurance information, land classification assignments, or any other raster spatial data. Along with `tile` columns there is typically an @ref:[`extent`](concepts.md#extent) specifying the geographic location of the data, the map projection of that geometry (@ref:[`crs`](concepts.md#coordinate-reference-system--crs-)), and a `timestamp` column representing the acquisition time. These columns can all be used in the `WHERE` clause when filtering.
RasterFrames also includes support for working with vector data, such as [GeoJSON][GeoJSON]. RasterFrames vector data operations let you filter with geospatial relationships like contains or intersects, mask cells, convert vectors to rasters, and more.
34
35
35
-
Raster data can be read from a number of sources. Through the flexible Spark SQL DataSource API, RasterFrames can be constructed from collections of georectified imagery (including Cloud Optimized GeoTIFFs or [COGS][COGS]), [GeoTrellis Layers][GTLayer], and from catalog of Landsat 8 and MODIS data sets on the [Amazon Web Services (AWS) Public Data Set (PDS)][PDS]. See @ref:[Raster Data I/O](raster-io.md) for details.
36
+
Raster data can be read from a @ref:[number of sources](raster-io.md). Through the flexible Spark SQL DataSource API, RasterFrames can be constructed from collections of imagery (including Cloud Optimized GeoTIFFs or [COGS][COGS]), [GeoTrellis Layers][GTLayer], and from catalogs of large datasets like Landsat 8 and MODIS data sets on the @ref:[AWS Public Data Set (PDS)](raster-catalogs.md#using-external-catalogs).
@@ -42,4 +43,4 @@ Raster data can be read from a number of sources. Through the flexible Spark SQL
42
43
[COGS]:https://www.cogeo.org/
43
44
44
45
[^1]: [_Demystifying Satellite Assets for Post-Disaster Situation Awareness_](https://docs.google.com/document/d/11bIw5HcEiZy8SKli6ZFQC2chVEiiIJ-f0o6btA4LU48).
45
-
World Bank via [OpenDRI.org](https://opendri.org/resource/demystifying-satellite-assets-for-post-disaster-situation-awareness/). Accessed November 28, 2018.
46
+
World Bank via [OpenDRI.org](https://opendri.org/resource/demystifying-satellite-assets-for-post-disaster-situation-awareness/). Accessed November 28, 2018.
Copy file name to clipboardExpand all lines: pyrasterframes/src/main/python/docs/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# RasterFrames
2
2
3
-
RasterFrames® brings together Earth-observing (EO) data analysis, big data computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity as well as a challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger.
3
+
RasterFrames® brings together Earth-observation (EO) data access, cloud computing, and DataFrame-based data science. The recent explosion of EO data from public and private satellite operators presents both a huge opportunity as well as a challenge to the data analysis community. It is _Big Data_ in the truest sense, and its footprint is rapidly getting bigger.
4
4
5
-
RasterFrames provides a DataFrame-centric view over arbitrary EO data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of Spark ML algorithms. By using DataFrames as the core cognitive and compute data model, it is able to deliver these features in a form that is accessible to general analysts while handling the rapidly growing data footprint.
5
+
RasterFrames provides a DataFrame-centric view over arbitrary raster data, enabling spatiotemporal queries, map algebra raster operations, and compatibility with the ecosystem of Spark ML algorithms. By using DataFrames as the core cognitive and compute data model, it is able to deliver these features in a form that is both accessible to general analysts and scalable along with the rapidly growing data footprint.
6
6
7
7
To learn more, please see the @ref:[Getting Started](getting-started.md) section of this manual.
0 commit comments