Skip to content

Commit 26f0b41

Browse files
committed
Update documentation
1 parent 2a6b702 commit 26f0b41

25 files changed

+2123
-2083
lines changed
45.3 KB
Loading
40.8 KB
Loading
275 KB
Loading
248 KB
Loading
755 KB
Loading

_sources/background/anatomy_of_a_data_cube.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
(*in progress*)
44

5-
The key object of our analyses will be a raster data cube, an n-dimensional object storing continuous measurements about some sort of physical quantity of different dimension(s). Many scientific workflows involve examining how a variable (such as temperature, windspeed, relative humidity, etc.) varies over time and/or space. Data cubes are a way of organizing geospatial data that let's us ask those types of questions.
5+
The key object of our analyses will be a [raster data cube](https://openeo.org/documentation/1.0/datacubes.html), an n-dimensional object storing continuous measurements about some sort of physical quantity of different dimension(s). Many scientific workflows involve examining how a variable (such as temperature, windspeed, relative humidity, etc.) varies over time and/or space. Data cubes are a way of organizing geospatial data that let's us ask those types of questions.
66

77
A very common data cube structure is a 3-dimensional object with (`x`,`y`,`time`) dimensions. This may sound quite simple, and it can be, but in practice, the amount and types of information contained within a single dataset can become complicated and unwieldy. As analysts, we accesss data (usually from providers such as Distributed Active Archive Centers ([DAACs](https://nssdc.gsfc.nasa.gov/earth/daacs.html))), and then we are responsible for organizing the data in a way that let's us ask the questions we'd like to of it. While some of these decisions can be very intuitive (eg. it makes sense to stack observations from different points in time along a time dimension), some can be less straightforward (Where and how should important metadata be stored so that it will propagate across appropriate operations and be accessible when it is needed?).
88

_sources/background/background.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,14 @@
11
# Context and motivation
22
(*in progress*)
3-
In recent years, the volume of available earth observation data has balooned, transforming both the types of scientific questions that can be asked and the fundamental ways in which analysts approach investigating these questions. These developments mean that analysts must gain new skills across a range of domains in order to work with these types of scientif data.
3+
## *Increasingly large, cloud-optimized data means new tools and approaches for data management*
4+
In recent years, the volume of available earth observation data has ballooned, transforming both the types of scientific questions that can be asked and the fundamental ways in which analysts approach investigating these questions. These developments mean that analysts must gain new skills across a range of domains in order to work with these types of scientific data.
45

5-
In recognition of the challenges that these opportunities can pose, we present tutorials that demonstrate scientific workflows using publicly accessible, cloud-native geospatial datasets and open-source scientific software tools.
6+
In recognition of the challenges that these opportunities can pose, we developed tutorials that demonstrate scientific workflows using publicly accessible, cloud-native geospatial datasets and open-source scientific software tools.
67

7-
This tutorial focuses on the complexities inherent to working with n-dimensional, gridded ({term}`raster`) datasets and uses the core stack of software packages built on and around the Xarray data model to demonstrate these workflows.
8+
These tutorials focuses on the complexities inherent to working with n-dimensional, gridded ({term}`raster`) datasets and use the core stack of software packages built on and around the [Xarray](https://xarray.dev/) data model to demonstrate these steps.
89

9-
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts.
10+
## *Asking questions of complex datasets*
1011

11-
We aim to provide detailed examples of scientific workflows that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained opens-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available cloud-hosted repositories. Importantly, these demonstrations are accompanied by detailed discussion of concepts involved with analyzing earth observation data such as dataset inspection, ... . Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.
12+
Scientific workflows involve asking complex questions of diverse types of data. Earth observation and related datasets often contain two types of information: measurements of a physical observable (eg. temperature) and metadata that provides auxiliary information that required in order to interpret the physical observable (time and location of measurement, information about the sensor, etc.). With increasingly complex and large volumes of earth observation data that is currently available, storing, managing and organizing these types of data can very quickly become a complex and challenging task, especially for students and early-career analysts (For more on these concepts, see [anatomy of a datacube](anatomy_of_a_data_cube.md)).
13+
14+
This book provides detailed examples of scientific workflow steps that ingest complex, multi-dimensional datastets, introduce users to the landscape of popular, actively-maintained opens-source software packages for working with geospatial data in Python, and include strategies for working with larger-than memory data stored in publicly available cloud-hosted repositories. Importantly, these demonstrations are accompanied by detailed discussion of concepts involved in analyzing earth observation data such as dataset inspection, manipulation, and exploratory analysis and visualization. Overall, we emphasize the importance of understanding the structure of multi-dimensional earth observation datasets within the context of a given data model and demonstrate how such an understanding can enable more efficient and intuitive scientific workflows.

_sources/intro/software.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,26 @@ On this page you'll find information about the computing environment and dataset
44

55
## Running tutorial materials locally
66

7-
To run the notebooks contained in this tutorial on your local machine, follow these steps:
7+
There are two options for creating a software environment, we recommend using [pixi](https://pixi.sh/latest/) to create a consistent environment on different operating systems. If you have pixi installed, simply run the following commands to open either the itslive or sentinel-1 Jupyter notebooks:
88

9-
1. Clone this book's GitHub repository:
9+
```
10+
cd cloud-open-source-geospatial-datacube-workflows
11+
pixi run itslive
12+
pixi run sentinel1
13+
```
14+
15+
**To use conda/mamba follow these steps:**
16+
17+
1. Clone this book's GitHub repository:
1018
```git clone https://github.com/e-marshall/cloud-open-source-geospatial-datacube-workflows.git```
1119

12-
2. Navigate into the `book` sub-directory:
13-
```cd cloud-oopen-source-geospatial-datacube-workflows/book```
20+
2. Navigate into the `book` sub-directory:
21+
```cd cloud-open-source-geospatial-datacube-workflows/book```
1422

15-
3. Create and activate a conda environment from the `geospatial_datacube_tutorial_env.yml` file located in the repo:
16-
```conda env create -f geospatial_datacube_tutorial_env.yml```
17-
```conda activate geospatial_datacube_tutorial_env```
23+
3. Create and activate a conda environment from the `environment.yml` file located in the repo:
24+
```conda env create -f .binder/environment.yaml```
1825

19-
4. Start Jupyterlab and navigate to the directories containing the jupyter notebooks (`itslive_nbs` and `s1_nbs`):
26+
4. Start Jupyterlab and navigate to the directories containing the jupyter notebooks (`itslive_nbs` and `s1_nbs`):
2027
```jupyterlab```
2128

2229
## todo
@@ -25,15 +32,9 @@ To run the notebooks contained in this tutorial on your local machine, follow th
2532
- update required packages below, some not necessary.
2633

2734

28-
29-
(from old version):
30-
create the `itslivetools_env` conda environment (`conda env create -f environment-unpinned.yml`) based on the `environment.yml` file [here](https://github.com/e-marshall/mynewbook/blob/master/environment.yml). This should work on any platform (linux, osx, windows) and will install the latest versions of all dependencies.
31-
32-
Alternatively, the code repository for this tutorial (https://github.com/e-marshall/itslive) also contains "lock" files for Linux (conda-linux-64.lock.yml) and MacOS (conda-osx-64.lock.yml) that pin exact versions of all required python packages for a [reproducible computing environment](https://mybinder.readthedocs.io/en/latest/tutorials/reproducibility.html).
33-
3435
## Required software packages
3536

36-
The `geospatial_datacube_tutorial_env.yml` file referenced above will install all of the required packages in a conda virtual environment. Below is a list of all packages imported throughout the notebooks:
37+
Below is a list of all packages imported throughout the notebooks:
3738

3839
```python
3940
import adlfs #check this may not be needed

_sources/itslive_nbs/1_accessing_itslive_s3_data.ipynb

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
"This notebook demonstrates how to query and access cloud-hosted Inter-mission Time Series of Land Ice Velocity and Elevation ([ITS_LIVE](https://its-live.jpl.nasa.gov/#access)) data from Amazon Web Services (AWS) S3 buckets. These data are stored as [Zarr](https://zarr.readthedocs.io/en/stable/) data cubes, a cloud-optimized format for array data. They are read into memory as [Xarray](https://docs.xarray.dev/en/stable/) Datasets.\n",
1111
"\n",
1212
"```{note}\n",
13-
"This tutorial was updated Jan 2025 to reflect changes to ITS_LIVE data urls and various software libraries\n",
13+
"This tutorial was updated Jan 2025 to reflect changes to the ITS_LIVE dataset and various software libraries.\n",
1414
"```"
1515
]
1616
},
@@ -23,6 +23,8 @@
2323
"\n",
2424
"(content:section_A)= \n",
2525
"**[A. Overview of ITS_LIVE data]()**\n",
26+
"- {{a1_its_nb1}}\n",
27+
"- {{a2_its_nb1}}\n",
2628
"\n",
2729
"(content:Section_B)=\n",
2830
"**[B. Read ITS_LIVE data from AWS S3 using Xarray](#b-read-its_live-data-from-aws-s3-using-xarray)**\n",
@@ -5017,7 +5019,7 @@
50175019
"cell_type": "markdown",
50185020
"metadata": {},
50195021
"source": [
5020-
"### Data structure overview\n",
5022+
"### {{a1_its_nb1}}\n",
50215023
"\n",
50225024
"#### Dimensions\n",
50235025
"- This object has 3 *dimensions*, `mid_date`, `x`, and `y`.\n",
@@ -5542,12 +5544,13 @@
55425544
"1) Metadata should be added to the `attrs` of an Xarray object so that the dataset is **self-describing** (You or a future user don't need external information to be able to interpret the data).\n",
55435545
"2) Wherever possible, metadata should follow Climate Forecast (CF) naming conventions.\n",
55445546
"\n",
5545-
" ## Climate Forecast (CF) Metadata Conventions\n",
5547+
" ### {{a2_its_nb1}}\n",
55465548
"\n",
55475549
"CF conventions address many of the challenges of inconsistent and non-descriptive metadata found in climate and earth observation datasets. By establishing common naming schemes for physical quantities and other attributes, these conventions facilitate collaboration, data fusion, and the development of tools for working with a range of data types. \n",
55485550
"\n",
55495551
"From the [CF documentation](https://cfconventions.org/): \n",
5550-
" The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF convention includes a standard name table, which defines strings that identify physical quantities.\n",
5552+
"\n",
5553+
">The CF metadata conventions are designed to promote the processing and sharing of files created with the NetCDF API. The conventions define metadata that provide a definitive description of what the data in each variable represents, and the spatial and temporal properties of the data. This enables users of data from different sources to decide which quantities are comparable, and facilitates building applications with powerful extraction, regridding, and display capabilities. The CF convention includes a standard name table, which defines strings that identify physical quantities.\n",
55515554
"\n",
55525555
"CF metadata conventions set common expectations for metadata names and locations across datasets. In this tutorial, we will use tools such as [cf_xarray]() that leverage CF conventions to add programmatic handling of CF metadata to Xarray objects, meaning that users can spend less time wrangling metadata. 🤩\n"
55535556
]

_sources/itslive_nbs/itslive_intro.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,3 @@ For instructions on setting up a computing environment needed for this tutorial,
8383

8484
For more background on the data used in this tutorial, head to [Tutorial Data](../background/tutorial_data.md).
8585

86-
Head to [1. Accessing cloud-hosted ITS_LIVE data](1_accessing_itslive_s3_data.ipynb) to get started!

0 commit comments

Comments
 (0)