Skip to content

Commit 427b246

Browse files
authored
Rick edits (#27)
* code updated * updates to nb3 to add orbital pass dir plots * updates to nbs per Ricks feedback and reference fixes
1 parent da277cb commit 427b246

File tree

8 files changed

+1330
-5836
lines changed

8 files changed

+1330
-5836
lines changed

book/_config.yml

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ parse:
4949
- substitution
5050
sphinx:
5151
config:
52-
bibtex_reference_style: author_year
52+
bibtex_reference_style: label
5353
myst_heading_anchors: 3
5454
myst_enable_extensions:
5555
- amsmath
@@ -207,10 +207,12 @@ sphinx:
207207
d2_s1_nb3: "2) Visualize duplicates"
208208
d3_s1_nb3: "3) Drop duplicates"
209209

210-
e_s1_nb3: "E. Data visualization"
211-
e1_s1_nb3: "1) Mean backscatter over time"
212-
e2_s1_nb3: "2) Seasonal backscatter variability"
213-
e3_s1_nb3: "3) Backscatter time series"
210+
e_s1_nb3: "E. Examine coverage over time series"
211+
212+
f_s1_nb3: "F. Data visualization"
213+
f1_s1_nb3: "1) Mean backscatter over time"
214+
f2_s1_nb3: "2) Seasonal backscatter variability"
215+
f3_s1_nb3: "3) Backscatter time series"
214216

215217

216218

book/background/context_motivation.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,14 @@ This book demonstrates scientific workflows using publicly-available, cloud-opti
77
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis.
88

99
```{epigraph}
10-
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis.After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11-
-- {cite}`abernathey_2021_cloud`
10+
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis.After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11+
12+
-- {cite:t}`abernathey_2021_cloud`
1213
```
1314

1415
### *II. Increasingly large, cloud-optimized data means new tools and approaches for data management*
1516

16-
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis {cite}`abernathey_2021_cloud,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`.
17+
The increase in publicly available earth observation data has transformed scientific workflows across a range of fields, prompting analysts to gain new skills in order to work with larger volumes of data in new formats and locations, and to use distributed cloud-computational resources in their analysis ({cite:t}`abernathey_2021_cloud,gentemann_2021_science,mathieu_2017_esas,ramachandran_2021_open,Sudmanns_2020_big,wagemann_2021_user`).
1718

1819
```{figure} imgs/fy24-projection-chart.png
1920
---
@@ -23,6 +24,6 @@ Volume of NASA Earth Science Data archives, including growth of existing-mission
2324

2425
### *III. Asking questions of complex datasets*
2526

26-
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts {cite}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`.
27+
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts ({cite:t}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`).
2728

2829
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained opens-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.

book/background/data_cubes.md

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@ The term **data cube** is used frequently throughout this book. This page contai
66

77
The key object of analysis in this book is a [raster data cube](https://openeo.org/documentation/1.0/datacubes.html). Raster data cubes are n-dimensional objects that store continuous measurements or estimates of physical quantities that exist along given dimension(s). Many scientific workflows involve examining how a variable (such as temperature, windspeed, relative humidity, etc.) varies over time and/or space. Data cubes are a way of organizing geospatial data that let us ask these questions.
88

9-
A very common data cube structure is a 3-dimensional object with (`x`,`y`,`time`) dimensions. While this is a relatively intuitive concept,in practice, the amount and types of information contained within a single dataset and the operations involved in managing them, can become complicated and unwieldy. As analysts, we accesss data (usually from providers such as Distributed Active Archive Centers ([DAACs](https://nssdc.gsfc.nasa.gov/earth/daacs.html))), and then we are responsible for organizing the data in a way that let's us ask questions of it. While some of these decisions are straightforward (eg. *It makes sense to stack observations from different points in time along a time dimension*), some can be more open-ended (*Where and how should important metadata be stored so that it will propagate across appropriate operations and be accessible when it is needed?*).
9+
A very common data cube structure is a 3-dimensional object with (`x`,`y`,`time`) dimensions ({cite:t}`baumann_2017_datacube,mahecha_2020_EarthSystemData,giuliani_2019_EarthObservationOpen,montero_2024_EarthSystemData`). While this is a relatively intuitive concept,in practice, the amount and types of information contained within a single dataset and the operations involved in managing them, can become complicated and unwieldy. As analysts, we accesss data (usually from providers such as Distributed Active Archive Centers ([DAACs](https://nssdc.gsfc.nasa.gov/earth/daacs.html))), and then we are responsible for organizing the data in a way that let's us ask questions of it. While some of these decisions are straightforward (eg. *It makes sense to stack observations from different points in time along a time dimension*), some can be more open-ended (*Where and how should important metadata be stored so that it will propagate across appropriate operations and be accessible when it is needed?*).
1010

1111
### *Two types of information*
1212
Fundamentally, many of these complexities can be reduced to one distinction: is a particular piece of information a physical observable (the main focus, or target, of the dataset), or is it metadata that provides necessary information in order to properly interpret and handle the physical observable? Answering this question will help you understand how to situate a piece of information within the broader data object.
1313

14-
[^mynote1]: An image collection is a set of n images, where images contain m variables or spectral bands. Band data from one image share a common spatial footprint, acquisition date/time, and spatial reference system but may have different pixel sizes. Technically, the data of bands may come from one or more files, depending on the organization of a particular data product." {cite}`appel_2019_ondemand`
14+
[^mynote1]: An image collection is a set of n images, where images contain m variables or spectral bands. Band data from one image share a common spatial footprint, acquisition date/time, and spatial reference system but may have different pixel sizes. Technically, the data of bands may come from one or more files, depending on the organization of a particular data product." {cite:p}`appel_2019_ondemand`
1515

1616

1717

1818
### *Consider an example*
1919

20-
We have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see {cite}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*.
20+
We have a time series of [NDVI](https://www.usgs.gov/landsat-missions/landsat-normalized-difference-vegetation-index) imagery generated from a stack of Landsat scenes. Before a user accesses a satellite imagery dataset, it has likely already undergone many levels of processing, transformation and re-organization. For more background on these steps, see Montero et al. {cite:p}`montero_2024_EarthSystemData`, *Section 3: 'The Earth System Data Cube Life cycle'*.
2121

22-
In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1] {cite}`appel_2019_ondemand` In the image collection, each satellite image contains information such as the following:
22+
In this example, we're accessing the dataset at a common dissemination point, an 'image collection'[^mynote1]. In the image collection, each satellite image contains information such as the following:
2323
- Acquisition date,
2424
- X-coordinate values,
2525
- Y-coordinate values,
@@ -60,7 +60,7 @@ The process described above is an example of preparing data for analysis. Thanks
6060

6161
```{epigraph}
6262
CEOS Analysis Ready Data (CEOS-ARD) are satellite data that have been processed to a minimum set of requirements and organized into a form that allows immediate analysis with a minimum of additional user effort and interoperability both through time and with other datasets.
63-
- Committee on Earth Observation Satellites ([CEOS](https://ceos.org/ard/index.html)) Analysis-Ready Data
63+
- Committee on Earth Observation Satellites ([CEOS](https://ceos.org/ard/index.html)) Analysis-Ready Data {cite}`lewis_2018_CEOSAnalysisReady`
6464
```
6565

6666
The development and increasing adoption of analysis-ready specifications for satellite imagery datasets is an exciting and transformative opportunity to increase the utilization of earth observation data.
@@ -71,7 +71,10 @@ However, many legacy datasets still require significant effort in order to be co
7171
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1](../tutorial1/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [tutorial 2](../tutorial2/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
7272

7373
### References
74-
- {cite}`montero_2024_EarthSystemData`, {cite}`appel_2019_ondemand`, {cite}`giuliani_2019_EarthObservationOpen`, {cite}`truckenbrodt_2019_Sentinel1ARD`
74+
- {cite:t}`montero_2024_EarthSystemData`
75+
- {cite:t}`appel_2019_ondemand`
76+
- {cite:t}`giuliani_2019_EarthObservationOpen`
77+
- {cite:t}`truckenbrodt_2019_Sentinel1ARD`
7578
## Additional data cube resources
7679
- [OpenEO - Data Cubes](https://openeo.org/documentation/1.0/datacubes.html)
7780
- [Open Data Cube initiative](https://www.opendatacube.org/about-draft)

0 commit comments

Comments
 (0)