You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: book/background/context_motivation.md
+4-5Lines changed: 4 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,8 @@ This book demonstrates scientific workflows using publicly-available, cloud-opti
7
7
Technological developments in recent decades have engendered fundamental shifts in the nature of scientific data and how it is used for analysis.
8
8
9
9
```{epigraph}
10
-
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis.After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11
-
12
-
-- {cite:t}`abernathey_2021_cloud`
10
+
"Traditionally, scientific data have been distributed via a “download model,” wherein scientists download individual data files to local computers for analysis. After downloading many files, scientists typically have to do extensive processing and organizing to make them useful for the data analysis; this creates a barrier to reproducibility, since a scientist’s analysis code must account for this unique “local” organization. Furthermore, the sheer size of the datasets (many terabytes to petabytes) can make downloading effectively impossible. Analysis of such data volumes also can benefit from parallel / distributed computing, which is not always readily available on local computers. Finally, this model reinforces inequality between privileged institutions that have the resources to host local copies of the data and those that don’t. This restricts who can participate in science."
11
+
-- {cite}`abernathey_2021_cloud`
13
12
```
14
13
15
14
### *II. Increasingly large, cloud-optimized data means new tools and approaches for data management*
@@ -24,6 +23,6 @@ Volume of NASA Earth Science Data archives, including growth of existing-mission
24
23
25
24
### *III. Asking questions of complex datasets*
26
25
27
-
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts ({cite:t}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`).
26
+
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (e.g. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With the increasingly complex and large volume of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts {cite}`mathieu_esas_2017,palumbo_2017_building,Sudmanns_2020_big,wagemann_2021_user`.
28
27
29
-
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained opens-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
28
+
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained open-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available, cloud-hosted repositories. These demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
Copy file name to clipboardExpand all lines: book/background/data_cubes.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -68,7 +68,7 @@ The development and increasing adoption of analysis-ready specifications for sat
68
68
However, many legacy datasets still require significant effort in order to be considered 'analysis-ready'. Furthermore, for analysts, 'analysis-ready' can be a subjective and evolving label. Semantically, from a user-perspective, analysis-ready data can be thought of as data whose structure is conducive to scientific analysis.
69
69
70
70
## *III. Analysis-ready data cubes & this book*
71
-
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1](../tutorial1/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [tutorial 2](../tutorial2/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
71
+
The tutorials in this book contain examples of data at various degrees of 'analysis-ready'. [Tutorial 1: ITS_LIVE](../itslive/itslive_intro.md) uses a dataset of multi-sensor observations that is already organized as a `(x,y,time)` cube with a common grid. In [Tutorial 2: Sentinel-1](../sentinel1/s1_intro.md), we will see an example of a dataset that has undergone intensive processing to make it 'analysis-ready' but requires further manipulation to arrive at the `(x,y,time)` cube format that will be easist to work with.
Copy file name to clipboardExpand all lines: book/endmatter/appendix.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ In the first tutorial, while making an [interactive visualization of vector data
8
8
9
9
## [2. Reading a stack of files with `xr.open_mfdataset()` (Sentinel-1 tutorial)](nbs/2_read_w_xropen_mfdataset.ipynb)
10
10
11
-
Xarray's `xr.open_mfdataset()`[function](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) allows the user to read in and combine multiple files at once to produce a single `xr.DataArray` object. This approach was explore when developing the [Read ASF-processed Sentinel-1 RTC data notebook](../tutorial2/nbs/1_read_asf_data.ipynb). However, `xr.open_mfdataset() didn't work well for this purpose because, while the stack of raster files used in this example covers a common area of interest, it includes several different spatial footprints. This creates problems when specifying a chunking strategy.
11
+
Xarray's `xr.open_mfdataset()`[function](https://docs.xarray.dev/en/stable/generated/xarray.open_mfdataset.html) allows the user to read in and combine multiple files at once to produce a single `xr.DataArray` object. This approach was explore when developing the [Read ASF-processed Sentinel-1 RTC data notebook](../sentinel1/nbs/1_read_asf_data.ipynb). However, `xr.open_mfdataset() didn't work well for this purpose because, while the stack of raster files used in this example covers a common area of interest, it includes several different spatial footprints. This creates problems when specifying a chunking strategy.
12
12
13
13
`xr.open_mfdataset()` takes a 'preprocess' argument that allows the user to write a function to specify how each raster file should be read so that the structure and metadata of the returned object matches the desired format. However, because it applies the same preprocessing steps to each file, the chunking strategyy is defined off of the first file in the stack. With files that cover different spatial footprints, different chunking strategies will be required. The processing works fine for lazy steps, but a memory 'blow-up' occurs at computation time.
14
14
@@ -28,4 +28,4 @@ If you wanted to select scenes from a single viewing geometry at the expense of
28
28
```
29
29
30
30
## [3. Another regridding approach using `xESMF` (Sentinel-1 tutorial)](nbs/3_regridding_w_xesmf.ipynb)
31
-
This notebook demonstrates an alternative approach to the regridding shown in [noteboook 5](../tutorial2/nbs/5_comparing_s1_rtc_datasets.ipynb) of Tutorial 2, but this time using a different regridding package.
31
+
This notebook demonstrates an alternative approach to the regridding shown in [noteboook 5](../sentinel1/nbs/5_comparing_s1_rtc_datasets.ipynb) of Tutorial 2, but this time using a different regridding package.
Copy file name to clipboardExpand all lines: book/intro/getting_started.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ Head to the [software](software.md) page for detailed instructions on how to get
14
14
15
15
Most of the examples in this book use data accessed programmatically from cloud-object storage. We make subsets of the data available in this book's Github repository to remove the need for computationally-intensive operations in the tutorials.
16
16
17
-
Several notebooks in the second tutorial use Sentinel-1 data processed by the Alaska Satellite Facility and downloaded locally. Users who would like to follow these steps on their own may do so (and access the data [here]()), but a smaller subset of the dataset is also made available [here](). For more detail on different ways to work through the Sentinel-1 tutorial, head [here](../tutorial2/s1_intro.md), and for more background on all of the datasets used in this book see [Tutorial Data](../background/tutorial_data.md).
17
+
Several notebooks in the second tutorial use Sentinel-1 data processed by the Alaska Satellite Facility and downloaded locally. Users who would like to follow these steps on their own may do so (and access the data [here]()), but a smaller subset of the dataset is also made available [here](). For more detail on different ways to work through the Sentinel-1 tutorial, head [here](../sentinel1/s1_intro.md), and for more background on all of the datasets used in this book see [Tutorial Data](../background/tutorial_data.md).
18
18
19
19
:::{important}
20
20
The datasets used in these tutorials can be complicated to work with and require significant background knowledge in order to understand their limitations and how best to interpret them. **It is the responsibility of the user** to understand the physical principles that underpin remote sensing datasets and how they should be used and interpreted. See the [Tutorial Data](../background/tutorial_data.md) section for detailed discussion of these datasets and links to important background information.
@@ -46,10 +46,10 @@ Background on data cubes and an introduction to array-based geoscience data and
46
46
Each tutorial focuses on a different type of remote sensing dataset and demonstrates how to assess and work through the nuances, details and challenges that can arise from each. A common characteristic of each dataset that is emphasized throughout the notebooks is working with larger-than-memory datasets on the computational resources of a standard laptop.
47
47
48
48
#### Part 1: {{part2_title}}
49
-
A [tutorial](../tutorial1/itslive_intro.md) focusing on [ITS_LIVE](https://its-live.jpl.nasa.gov/), a NASA MEASURES project and publicly accessible dataset stored in an AWS S3 repo as Zarr data cubes.
49
+
A [tutorial](../itslive/itslive_intro.md) focusing on [ITS_LIVE](https://its-live.jpl.nasa.gov/), a NASA MEASURES project and publicly accessible dataset stored in an AWS S3 repo as Zarr data cubes.
50
50
51
51
#### Part 2: {{part3_title}}
52
-
This [tutorial](../tutorial2/s1_intro.md) focuses on another satellite dataset: [Sentinel-1](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-1) Radiometric Terrain Corrected imagery. Sentinel-1 is a satellite-based imaging radar. More specifically, it is a synthetic aperture radar (SAR). SAR sensor look to the side rather than straight-down like conentional optical and infrared satellite sensors. This side-looking geometry causes geometric distortions that need to be addressed prior to analysis. SAR data undergoes different types of processing for different scientific applications. Part 2 demonstrates how to access this data from two publicly available, online respositories: Alaska Satellite Facility and Microsoft Planetary Computer. These notebooks demonstrate the different ways to read this data and prepare it for analysis, as well as an initial comparison of the two datasets.
52
+
This [tutorial](../sentinel1/s1_intro.md) focuses on another satellite dataset: [Sentinel-1](https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-1) Radiometric Terrain Corrected imagery. Sentinel-1 is a satellite-based imaging radar. More specifically, it is a synthetic aperture radar (SAR). SAR sensor look to the side rather than straight-down like conentional optical and infrared satellite sensors. This side-looking geometry causes geometric distortions that need to be addressed prior to analysis. SAR data undergoes different types of processing for different scientific applications. Part 2 demonstrates how to access this data from two publicly available, online respositories: Alaska Satellite Facility and Microsoft Planetary Computer. These notebooks demonstrate the different ways to read this data and prepare it for analysis, as well as an initial comparison of the two datasets.
4. Start Jupyterlab and navigate to the directories containing the jupyter notebooks (`itslive_nbs` and `s1_nbs`):
31
+
4. Start Jupyterlab and navigate to the directories containing the Jupyter notebooks (`itslive/nbs` and `s1/nbs`):
32
32
```jupyterlab```
33
33
34
-
Both tutorials also uses functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive_nbs/itslive_tools.py) and [`s1_tools.py`](../s1_nbs/s1_tools.py).
34
+
Both tutorials use functions that are stored in scripts associated with each dataset. You can find these scripts here: [`itslive_tools.py`](../itslive/nbs/itslive_tools.py) and [`s1_tools.py`](../s1/nbs/s1_tools.py).
0 commit comments