Skip to content

Commit b3f0a2d

Browse files
authored
Documentation improvements (#69)
* Add cloud cover landsat8 example. * Extensive documentation updates. * Move resume options to train section. * Remove model intermediate field from network. * Delete max_tile_offset (didn't work well). * Hold back usgs package version.
1 parent 42869d7 commit b3f0a2d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2041
-653
lines changed

README.md

Lines changed: 58 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,88 @@
11
**DELTA** (Deep Earth Learning, Tools, and Analysis) is a framework for deep learning on satellite imagery,
2-
based on Tensorflow. Use DELTA to train and run neural networks to classify large satellite images. DELTA
3-
provides pre-trained autoencoders for a variety of satellites to reduce required training data
4-
and time.
2+
based on Tensorflow. DELTA classifies large satellite images with neural networks, automatically handling
3+
tiling large imagery.
54

65
DELTA is currently under active development by the
7-
[NASA Ames Intelligent Robotics Group](https://ti.arc.nasa.gov/tech/asr/groups/intelligent-robotics/). Expect
8-
frequent changes. It is initially being used to map floods for disaster response, in collaboration with the
6+
[NASA Ames Intelligent Robotics Group](https://ti.arc.nasa.gov/tech/asr/groups/intelligent-robotics/).
7+
Initially, it is mapping floods for disaster response, in collaboration with the
98
[U.S. Geological Survey](http://www.usgs.gov), [National Geospatial Intelligence Agency](https://www.nga.mil/),
109
[National Center for Supercomputing Applications](http://www.ncsa.illinois.edu/), and
11-
[University of Alabama](https://www.ua.edu/). DELTA is a component of the
12-
[Crisis Mapping Toolkit](https://github.com/nasa/CrisisMappingToolkit), in addition
13-
to our previous software for mapping floods with Google Earth Engine.
10+
[University of Alabama](https://www.ua.edu/).
1411

1512
Installation
1613
============
1714

18-
1. Install [python3](https://www.python.org/downloads/), [GDAL](https://gdal.org/download.html), and the [GDAL python bindings](https://pypi.org/project/GDAL/).
19-
For Ubuntu Linux, you can run `scripts/setup.sh` from the DELTA repository to install these dependencies.
15+
1. Install [python3](https://www.python.org/downloads/), [GDAL](https://gdal.org/download.html),
16+
and the [GDAL python bindings](https://pypi.org/project/GDAL/). For Ubuntu Linux, you can run
17+
`scripts/setup.sh` from the DELTA repository to install these dependencies.
2018

21-
2. Install Tensorflow with pip following the [instructions](https://www.tensorflow.org/install). For
19+
2. Install Tensorflow following the [instructions](https://www.tensorflow.org/install). For
2220
GPU support in DELTA (highly recommended) follow the directions in the
2321
[GPU guide](https://www.tensorflow.org/install/gpu).
2422

2523
3. Checkout the delta repository and install with pip:
2624

27-
```
28-
git clone http://github.com/nasa/delta
29-
python3 -m pip install delta
30-
```
25+
```bash
26+
git clone http://github.com/nasa/delta
27+
python3 -m pip install delta
28+
```
29+
30+
DELTA is now installed and ready to use!
31+
32+
Documentation
33+
=============
34+
DELTA can be used either as a command line tool or as a python library.
35+
See the python documentation for the master branch [here](https://nasa.github.io/delta/),
36+
or generate the documentation with `scripts/docs.sh`.
37+
38+
Example
39+
=======
40+
41+
As a simple example, consider training a neural network to map clouds with Landsat-8 images.
42+
The script `scripts/example/l8_cloud.sh` trains such a network using DELTA from the
43+
[USGS SPARCS dataset](https://www.usgs.gov/core-science-systems/nli/landsat/spatial-procedures-automated-removal-cloud-and-shadow-sparcs),
44+
and shows how DELTA can be used. The steps involved in this, and other, classification processes are:
45+
46+
1. **Collect** training data. The SPARCS dataset contains Landsat-8 imagery with and without clouds.
3147

32-
This installs DELTA and all dependencies (except for GDAL which must be installed manually in step 1).
48+
2. **Label** training data. The SPARCS labels classify each pixel according to cloud, land, water and other classes.
3349

34-
Usage
35-
=====
50+
3. **Train** the neural network. The script `scripts/example/l8_cloud.sh` invokes the command
3651

37-
As a simple example, consider training a neural network to map water in Worldview imagery.
38-
You would:
52+
```
53+
delta train --config l8_cloud.yaml l8_clouds.h5
54+
```
3955
40-
1. **Collect** training data. Find and save Worldview images with and without water. For a robust
41-
classifier, the training data should be as representative as possible of the evaluation data.
56+
where `scripts/example/l8_cloud.yaml` is a configuration file specifying the labeled training data and
57+
training parameters (learn more about configuration files below). A neural network file
58+
`l8_clouds.h5` is output.
4259
43-
2. **Label** training data. Create images matching the training images pixel for pixel, where each pixel
44-
in the label is 0 if it is not water and 1 if it is.
60+
4. **Classify** with the trained network. The script runs
4561
46-
3. **Train** the neural network. Run
47-
```
48-
delta train --config wv_water.yaml wv_water.h5
49-
```
50-
where `wv_water.yaml` is a configuration file specifying the labeled training data and any
51-
training parameters (learn more about configuration files below). The command will output a
52-
neural network file `wv_water.h5` which can be
53-
used for classification. The neural network operates on the level of *chunks*, inputting
54-
and output smaller blocks of the image at a time.
62+
```
63+
delta classify --config l8_cloud.yaml --image-dir ./validate --overlap 32 l8_clouds.h5
64+
```
5565
56-
4. **Classify** with the trained network. Run
57-
```
58-
delta classify --image image.tiff wv_water.h5
59-
```
60-
to classify `image.tiff` using the network `wv_water.h5` learned previously.
61-
The file `image_predicted.tiff` will be written to the current directory showing the resulting labels.
66+
to classify the images in the `validate` folder using the network `l8_clouds.h5` learned previously.
67+
The overlap tiles to ignore border regions when possible to make a more aesthetically pleasing classified
68+
image. The command outputs a predicted image and confusion matrix.
6269
63-
Configuration Files
64-
-------------------
70+
The results could be improved--- with more training, more data, an improved network, or more--- but this
71+
example shows the basic usage of DETLA.
6572
66-
DELTA is configured with YAML files. Some options can be overwritten with command line options (use
67-
`delta --help` to see which). [Learn more about DELTA configuration files](./delta/config/README.md).
73+
Configuration and Extensions
74+
============================
6875
69-
All available configuration options and their default values are shown [here](./delta/config/delta.yaml).
70-
We suggest that users create one reusable configuration file to describe the parameters specific
71-
to each dataset, and separate configuration files to train on or classify that dataset.
76+
DELTA provides many options for customizing data inputs and training. All options are configured via
77+
YAML files. Some options can be overwritten with command line options (use
78+
`delta --help` to see which). See the `delta.config` README to learn about available configuration
79+
options.
7280
73-
Supported Image Formats
74-
-----------------------
75-
DELTA supports tiff files and a few other formats.
76-
Users can extend DELTA with their own custom formats. We are looking to expand DELTA to support other
77-
useful file formats.
81+
DELTA can be extended to support custom neural network layers, image types, preprocessing operations, metrics, losses,
82+
and training callbacks. Learn about DELTA extensions in the `delta.config.extensions` documentation.
7883
79-
MLFlow
80-
------
84+
Data Management
85+
=============
8186
8287
DELTA integrates with [MLFlow](http://mlflow.org) to track training. MLFlow options can
8388
be specified in the corresponding area of the configuration file. By default, training and
@@ -93,18 +98,6 @@ View all the logged training information through mlflow by running::
9398
and navigating to the printed URL in a browser. This makes it easier to keep track when running
9499
experiments and adjusting parameters.
95100
96-
Using DELTA from Code
97-
=====================
98-
You can also call DELTA as a python library and customize it with your own extensions, for example,
99-
custom image types. The python API documentation can be generated as HTML. To do so:
100-
101-
```
102-
pip install pdoc3
103-
./scripts/docs.sh
104-
```
105-
106-
Then open `html/delta/index.html` in a web browser.
107-
108101
Contributors
109102
============
110103
We welcome pull requests to contribute to DELTA. However, due to NASA legal restrictions, we must require

delta/config/README.md

Lines changed: 136 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,33 @@ all options, showing all parameters DELTA and their default values, see [delta.y
55

66
`delta` accepts multiple config files on the command line. For example, run
77

8-
delta train --config dataset.yaml --config train.yaml
8+
```bash
9+
delta train --config dataset.yaml --config train.yaml
10+
```
11+
12+
to train on a dataset specified by `dataset.yaml`:
13+
14+
```yaml
15+
dataset:
16+
images:
17+
type: tiff
18+
directory: train/
19+
labels:
20+
type: tiff
21+
directory: labels/
22+
classes: 2
23+
```
24+
25+
with training parameters given in `train.yaml`:
26+
27+
```yaml
28+
train:
29+
network:
30+
model:
31+
yaml_file: networks/convpool.yaml
32+
epochs: 10
33+
```
934

10-
to train on a dataset specified by `dataset.yaml` with training parameters given in `train.yaml`.
1135
Parameters can be overriden globally for all runs of `delta` as well, by placing options in
1236
`$HOME/.config/delta/delta.yaml` on Linux. This is only recommended for global parameters
1337
such as the cache directory.
@@ -17,79 +41,137 @@ only setting the necessary options.
1741
Note that some configuration options can be overwritten on the command line: run
1842
`delta --help` to see which.
1943

20-
The remainder of this document details the available configuration parameters. Note that
21-
DELTA is still under active development and parts are likely to change in the future.
44+
The remainder of this document details the available configuration parameters.
2245

2346
Dataset
2447
-----------------
2548
Images and labels are specified with the `images` and `labels` fields respectively,
2649
within `dataset`. Both share the
2750
same underlying options.
2851

29-
* `type`: Indicates which loader to use, e.g., `tiff` for geotiff.
52+
* `type`: Indicates which `delta.imagery.delta_image.DeltaImage` image reader to use, e.g., `tiff` for geotiff.
53+
The reader should previously be registered with `delta.config.extensions.register_image_reader`.
3054
* Files to load must be specified in one of three ways:
31-
* `directory` and `extension`: Use all images in the directory ending with the given extension.
32-
* `file_list`: Provide a text file with one image file name per line.
33-
* `files`: Provide a list of file names in yaml.
34-
* `preprocess`: Supports limited image preprocessing. We recommend
55+
* `directory` and `extension`: Use all images in the directory ending with the given extension.
56+
* `file_list`: Provide a text file with one image file name per line.
57+
* `files`: Provide a list of file names in yaml.
58+
* `preprocess`: Specify a preprocessing chain. We recommend
3559
scaling input imagery in the range 0.0 to 1.0 for best results with most of our networks.
3660
DELTA also supports custom preprocessing commands. Default actions include:
37-
* `scale` with `factor` argument: Divide all values by amount.
38-
* `offset` with `factor` argument: Add `factor` to pixel values.
39-
* `clip` with `bounds` argument: clip all pixels to bounds.
61+
* `scale` with `factor` argument: Divide all values by amount.
62+
* `offset` with `factor` argument: Add `factor` to pixel values.
63+
* `clip` with `bounds` argument: clip all pixels to bounds.
64+
Preprocessing commands are registered with `delta.config.extensions.register_preprocess`.
65+
A full list of defaults (and examples of how to create new ones) can be found in `delta.extensions.preprocess`.
4066
* `nodata_value`: A pixel value to ignore in the images.
67+
* `classes`: Either an integer number of classes or a list of individual classes. If individual classes are specified,
68+
each list item should be the pixel value of the class in the label images, and a dictionary with the
69+
following potential attributes (see example below):
70+
* `name`: Name of the class.
71+
* `color`: Integer to use as the RGB representation for some classification options.
72+
* `weight`: How much to weight the class during training (useful for underrepresented classes).
4173

4274
As an example:
4375

44-
```
45-
dataset:
46-
images:
47-
type: worldview
48-
directory: images/
49-
labels:
50-
type: tiff
51-
directory: labels/
52-
extension: _label.tiff
53-
```
54-
55-
This configuration will load worldview files ending in `.zip` from the `images/` directory.
76+
```yaml
77+
dataset:
78+
images:
79+
type: tiff
80+
directory: images/
81+
preprocess:
82+
- scale:
83+
factor: 256.0
84+
nodata_value: 0
85+
labels:
86+
type: tiff
87+
directory: labels/
88+
extension: _label.tiff
89+
nodata_value: 0
90+
classes:
91+
- 1:
92+
name: Cloud
93+
color: 0x0000FF
94+
weight: 2.0
95+
- 2:
96+
name: Not Cloud
97+
color: 0xFFFFFF
98+
weight: 1.0
99+
```
100+
101+
This configuration will load tiff files ending in `.tiff` from the `images/` directory.
56102
It will then find matching tiff files ending in `_label.tiff` from the `labels` directory
57-
to use as labels.
103+
to use as labels. The image values will be divied by a factor of 256 before they are used.
104+
(It is often helpful to scale images to a range of 0-1 before training.) The labels represent two classes:
105+
clouds and non-clouds. Since there are fewer clouds, these are weighted more havily. The label
106+
images should contain 0 for nodata, 1 for cloud pixels, and 2 for non-cloud pixels.
58107

59108
Train
60109
-----
61110
These options are used in the `delta train` command.
62111

63-
* `network`: The nueral network to train. See the next section for details.
112+
* `network`: The nueral network to train. One of `yaml_file` or `layers` must be specified.
113+
* `yaml_file`: A path to a yaml file with only the params and layers fields. See `delta/config/networks`
114+
for examples.
115+
* `params`: A dictionary of parameters to substitute in the `layers` field.
116+
* `layers`: A list of layers which compose the network. See the following section for details.
64117
* `stride`: When collecting training samples, skip every `n` pixels between adjacent blocks. Keep the
65-
default of 1 to use all available training data.
66-
* `batch_size`: The number of chunks to train on in a group. May affect convergence speed. Larger
67-
batches allow higher training data throughput, but may encounter memory limitations.
118+
default of ~ or 1 to use all available training data. Not used for fully convolutional networks.
119+
* `batch_size`: The number of patches to train on at a time. If running out of memory, reducing
120+
batch size may be helpful.
68121
* `steps`: If specified, stop training for each epoch after the given number of batches.
69122
* `epochs`: the number of times to iterate through all training data during training.
70123
* `loss`: [Keras loss function](https://keras.io/losses/). For integer classes, use
71-
`sparse_categorical_cross_entropy`.
72-
* `metrics`: A list of [Keras metrics](https://keras.io/metrics/) to evaluate.
73-
* `optimizer`: The [Keras optimizer](https://keras.io/optimizers/) to use.
124+
`sparse_categorical_cross_entropy`. May be specified either as a string, or as a dictionary
125+
with arguments to pass to the loss function constructor. Custom losses registered with
126+
`delta.config.extensions.register_loss` may be used.
127+
* `metrics`: A list of [Keras metrics](https://keras.io/metrics/) to evaluate. Either the string
128+
name or a dictionary with the constructor arguments may be used. Custom metrics registered with
129+
`delta.config.extensions.register_metric` or loss functions may also be used.
130+
* `optimizer`: The [Keras optimizer](https://keras.io/optimizers/) to use. May be specified as a string or
131+
as a dictionary with constructor parameters.
132+
* `callbacks`: A list of [Keras callbacks)(https://keras.io/api/callbacks/) to use during training, specified as
133+
either a string or as a dictionary with constructor parameters. Custom callbacks registered with
134+
`delta.config.extensions.register_metric` may also be used.
74135
* `validation`: Specify validation data. The validation data is tested after each epoch to evaluate the
75136
classifier performance. Always use separate training and validation data!
76137
* `from_training` and `steps`: If `from_training` is true, take the `steps` training batches
77138
and do not use it for training but for validation instead.
78139
* `images` and `labels`: Specified using the same format as the input data. Use this imagery as testing data
79140
if `from_training` is false.
141+
* `log_folder` and `resume_cutoff`: If log_folder is specified, store read records of how much of each image
142+
has been trained on in this folder. If the number of reads exceeds resume_cutoff, skip the tile when resuming
143+
training. This allows resuming training skipping part of an epoch. You should generally not bother using this except
144+
on very large training sets (thousands of large images).
80145

81146
### Network
82147

83-
These options configure the neural network to train with the `delta train` command.
148+
For the `layers` attribute, any [Keras Layer](https://keras.io/api/layers/) can
149+
be used, including custom layers registered with `delta.config.extensions.register_layer`.
84150

85-
* `classes`: The number of classes in the input data. The classes must currently have values
86-
0 - n in the label images.
87-
* `model`: The network structure specification.
88-
folder. You can either point to another `yaml_file`, such as the ones in the delta/config/networks
89-
directory, or specify one under the `model` field in the same format as these files. The network
90-
layers are specified using the [Keras functional layers API](https://keras.io/layers/core/)
91-
converted to YAML files.
151+
Sub-fields of the layer are argument names and values which are passed to the layer's constructor.
92152

153+
A special sub-field, `inputs`, is a list of the names of layers to pass as inputs to this layer.
154+
If `inputs` is not specified, the previous layer is used by default. Layer names can be specified `name`.
155+
156+
```yaml
157+
layers:
158+
Input:
159+
shape: [~, ~, num_bands]
160+
name: input
161+
Add:
162+
inputs: [input, input]
163+
```
164+
165+
This simple example takes an input and adds it to itself.
166+
167+
Since this network takes inputs of variable size ((~, ~, `num_bands`) is the input shape) it is a **fully
168+
convolutional network**. This means that during training and classification, it will be evaluated on entire
169+
tiles rather than smaller chunks.
170+
171+
A few special parameters are available by default:
172+
173+
* `num_bands`: The number of bands / channels in an image.
174+
* `num_classes`: The number of classes provided in dataset.classes.
93175

94176
MLFlow
95177
------
@@ -119,10 +201,18 @@ General
119201
-------
120202

121203
* `gpus`: The number of GPUs to use, or `-1` for all.
204+
* `verbose`: Trigger verbose printing.
205+
* `extensions`: List of extensions to load. Add custom modules here and they will be loaded when
206+
delta starts.
207+
208+
I/O
209+
-------
122210
* `threads`: The number of threads to use for loading images into tensorflow.
123-
* `tile_size`: The size of a tile to load from an image at a time. For convolutional networks (input size is [~, ~, X],
124-
an entire tile is one training sample. For fixed size networks the tile is split into chunks. This parameter affects
125-
performance: larger tiles will be faster but take more memory (quadratic with chunk size for fixed size networks).
126-
* `cache`: Configure cacheing options. The subfield `dir` specifies a directory on disk to store cached files,
127-
and `limit` is the number of files to retain in the cache. Used mainly for image types
128-
which much be extracted from archive files.
211+
* `tile_size`: The size of a tile to load into memory at a time. For fully convolutional networks, the
212+
entire tile will be processed at a time, for others it will be chunked.
213+
* `interleave_images`: The number of images to interleave between. If this value is three, three images will
214+
be opened at a time. Chunks / tiles will be interleaved from the first three tiles until one is completed, then
215+
a new image will be opened. Larger interleaves can aid training (but comes at a cost in memory).
216+
* `cache`: Options for a cache, which is used by a few image types (currently worldview and landsat).
217+
* `dir`: Directory to store the cache. `default` gives a reasonable OS-specific default.
218+
* `limit`: Maximum number of items to store in the cache before deleting old entries.

0 commit comments

Comments
 (0)