This utility within the Climate Action Framework provides the full workflow of:
- training a deep learning model for land-use and land-cover classes,
- registering and tracking the model in an online store and
- serving the model's predictions via a REST API which enables users to request LULC classifications for arbitrary areas and timestamps.
The main reason for creating such a utility is to:
- Gain access to a mechanism capable of preparing semantic segmentation models fine-tailored for specific research scenarios,
- Enable rapid model training for various sets of labels (encoded as OSM query filters) and circumstances ( unpredictable events),
- Possibility to use the model as a high-quality imputing mechanism (missing data filling) for real-time OSM data.
- Create a lightweight solution that can be adjusted for various types of hardware,
- Prepare a project baseline that can be configured to utilize different data sources and model implementation.
This Package uses uv for environment and package management.
Environments are automatically used and packages updated when you uv run
commands, but you can also manually trigger
package installation with uv sync --group [group]
.
The uv run
commands automatically invoke the environment, but you can also directly activate it with
source .venv/bin/activate
(and close it like normal with deactivate
).
We highly suggest using a CUDA-a compatible device to train the model.
To check whether such a device is available on your machine, run nvidia-smi
in the console.
When you start the training script, the logs will include a line like GPU available: True (cuda), used: True
if CUDA
is available.
Note that the repository supports pre commit hooks defined in the .pre-commit-config.yaml
file.
For the description of each hook visit the documentation of:
Run uv run pre-commit install
to activate them.
Run uv run --all-groups pytest
to run the tests.
To use the utility, you require access to Neptune.ai and other external services depending on
the model you run.
Secret keys and common configuration variables can be set in the .env
file and are described in
.env.template
.
Any environment variables that include special characters must be wrapped in single quotation marks (''
).
There are many more configuration files available in /conf/
.
Visit /conf/config.yaml
to review and modify the configuration as required.
Note that the config under train
relates to training-specific configuration, while the config under serve
is used
by the inference API and should only be modified with caution.
OpenStreetMap LULC polygons are used as training labels via this OSM2LULC mapping. The model is trained locally and stored at neptune.ai. The feature space is constructed from Sentinel 1, 2 and a DEM.
The underlying semantic segmentation model is called SegFormer. It has the ability to delineate homogenous regions precisely. Due to its configurable and lightweight architecture, SegFormer can be quickly trained to match a specific use case. Preparing a country-specific model should take around two days (GPU: GeForce 3090).
To start training a new model, first create a config folder under /conf/train/
and update the default
config path in /conf/config.yaml
.
Then copy the relevant config files from /conf/examples/*.yaml
and follow the comments from the
examples to create your config files.
After initial configuration of your training parameters, the following data preparation steps have been automated in scripts, as described below.
To select the area on which the model will be trained, an area descriptor has to be prepared or computed.
The area descriptor will generate a set of tiles to use during training.
To automatically prepare the descriptor, set relevant area parameters
in conf/train/<YOUR_FOLDER>/area_descriptor.yaml
and run the following command:
uv run lulc/compute_area_descriptor.py
The area descriptor (as a csv
) and a visual representation of it (as png
) will be saved
to data/area/
.
Optionally, to sanity check the imagery that would be sourced for each of the tiles, run:
uv run --env-file .env lulc/save_imagery.py
To save just a small sample area, you can also create a smaller area descriptor file and provide the optional command
line input 'train.area.aoi_file=data/area/test.csv'
(note the string wrapping).
Note that the sentinel_hub
operator already caches the imagery requested, so if using Sentinel Hub, one can also find
the relevant tiff files in /cache/imagery/sentinel_hub/...
.
OpenStreetMap LULC polygons are used as ground truth training labels via the OSM2LULC mapping defined in
/conf/train/<YOUR_FOLDER>/label.yaml
.
To define the ground truth labels, create a new label.yaml
file in your training config.
Optionally, to export a single raster of your ground truth labels (for visual inspection and approval) of your whole training area, run:
uv run --env-file .env lulc/export_osm_labels.py
The raster files of ground truth labels will be saved to the cache_dir.
Images need to be normalised to a similar range across sensor channels before they are used during model training.
LULC data also always contains class imbalance. In OSM, this imbalance can be aggravated through the data collection process. The model can make use of class weights to account for this problem by adjusting the loss function. The class weights are based on the spatial resolution of the training data and the resulting number of pixels that are attributed to each class.
To calculate a reasonable set of normalisation values and class weights for new datasets, one needs to run the following script (note that this will load all of the images in your area descriptor so can take some time):
uv run --env-file .env lulc/calculate_dataset_statistics.py
The resulting image normalisation parameters and class weights must be copied to data.yaml
before training.
Finally, after setting up the configuration, training can be run with the following command:
uv run --env-file .env lulc/train.py
To serve an inference session for a model trained in this utility, we spawn a REST API locally.
Before starting the API, the configuration in /conf/serve/
must be updated for the model being hosted.
The relevant config files can simply be copied from the matching training configuration, and then changing the
# @package train.XYZ
header to # @package serve.XYZ
.
Also choose the desired model version from
the Model Registry,
e.g.: LULC-SEG-2
and modify the conf/serve/app.yaml
file.
Copy the .env_template
file to .env
and populate it with the necessary fields.
Then start the application:
uv run --group deploy --no-dev --env-file .env app/api.py
Go to localhost:8000 to see the API in action.
The tool is also Dockerised. Images are automatically built and deployed in the CI-pipeline.
In case you want to manually build and run locally (e.g. to test a new feature in development), execute
docker build . --tag repo.heigit.org/climate-action/lulc-utility:devel
docker run --rm --publish 8000:8000 --env-file .env repo.heigit.org/climate-action/lulc-utility:devel
Note that this will overwrite any existing image with the same tag (i.e. the one you previously pulled from the Climate Action docker registry).
To run behind a proxy, you can configure the root path using the environment variable ROOT_PATH
.
To push a new version to our docker registry (i.e. to overwrite the one on the Climate Action docker registry), run
docker build . --tag repo.heigit.org/climate-action/lulc-utility:devel
docker image push repo.heigit.org/climate-action/lulc-utility:devel
- Update the CHANGELOG.md. It should already be up to date but give it one last read and update the heading above this upcoming release
- Decide on the new version number. Please adhere to the Semantic Versioning scheme, based on the changes since the last release.
- Update the version attribute in the pyproject.toml
- Create a release on GitLab, including a changelog