🐸 🌿 EcoWikiRS: Learning Ecological Representations of Satellite Images from Weak Supervision with Species Observations and Wikipedia
Valerie Zermatten , Javiera Castillo-Navarro
,Pallavi Jain
, Devis Tuia
Diego Marcos
April 2025: 🎉 🎉 EcoWikiRS was accepted at the EARTHVISION 2025 Workshop in conjunction with the Computer Vision and Pattern Recognition (CVPR) 2025 Conference.
How to cite this work:
@InProceedings{Zermatten_2025_WikiRS,
author = { Zermatten, Valerie and Castillo-Navarro, Javiera and Jain, Pallavi and Tuia, Devis and Marcos, Diego},
title = {EcoWikiRS: Learning Ecological Representations of Satellite Images from Weak Supervision with Species Observations and Wikipedia},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2025},
pages = {00-00}
}
We use python 3.10 with pytorch 2.2.0 and cuda 12.1.
Required Python packages are listed in the environment.yml, which can be used to build a conda environment by following the instructions below :
conda env create --file environment.yml python==3.10
conda activate wikirs
All the arguments are described in more detail in the argument-parser function in the utils/argparser.py file. The following command line is an example to launch an experiement with our best model :
# train model with WINCEL loss, SkyCLIP pretrained model :
python train_multi_text.py --criterion WINCEL --model SkyCLIP
More training options are provided in the run.sh file.
We propose a method to learn ecological properties of aerial images by learning an alignment with species habitat descriptions.
- We release the EcoWikiRS dataset, composed of triplets:
- high-resolution aerial images (50cm, RGB bands)
- a list of species observations collected from GBIF, geolocated within the footprint of the aerial image.
- sentences describing the habitat of the observed species, extracted from the corresponding Wikipedia article.
- We propose WINCEL, a weighted version of the InfoNCE loss. WINCEL aims to identify text passages that are relevant to the image from the descriptions. WINCEL filters out text that describes properties that are specific only to part of the species’ niche or are irrelevant to a specific image.
Formally, WINCEL is computed as follows :
where
We evaluate our approach in the task of ecosystem zero-shot classification by following the habitat definitions from the European Nature Information System (EUNIS). Our results show that our approach helps in understanding RS images in a more ecologically meaningful manner.
We generate visual features with both the pretrained and the fine-tuned SkyCLIP model and plot the cross-modal similarity on the surface of Switzerland (one image of 100 m by 100m per km2).
For plots (b), (c) and (d), we observe that the maps generated by the fine-tuned models correctly highlight the warmest region, plateau and coldest regions of Switzerland, which we assess using the temperature map (a) as a proxy.
We also have strong quantitative results: 🎉
-
The proposed WINCEL approach is better than InfoNCE for fine-tuning GeoRSCLIP, SkyCLIP and CLIP, illustrating its capacity to focus on more useful sentences during training.
-
We trained using different sets of passages from Wikipedia articles, including sentences from the habitat section, based on a set of keywords and random sentences. Passages from the “habitat” section consistently outperform the other approaches, highlighting the importance of quality over quantity for improving model performance.
Check out the paper to learn more!
The EcoWikiRS dataset can be retrieved from Zenodo:- The EUNIS ecosystem type map for Switzerland, with a spatial resolution of 100m, comprises a final set of 25 habitats.
- Distribution of samples across EUNIS ecosystem types on a log scale.
- Number of observations per species in our dataset after filtering. Most species were observed very few times, whereas a few species were observed over 1000 times.
- The distribution of our training samples across Switzerland is split into training (60%), testing (30%) and validation (10%) sets following a block split approach with a size of 20 km.
-
More information on the EUNIS ecosystem type map is available on the European Environment Agency website : Ecosystem type map (all classes).
-
The raw aerial images with 10cm resolution from the swissIMAGE product can be openly downloaded from the swisstopo website
If you are interested in contributing to one of the aforementioned points or working on a similar project and wish to collaborate, please reach out to ECEO.
For code-related contributions, suggestions or inquiries, please open a GitHub issue.
We acknowledge the following code repositories that were useful throughout the EcoWikiRS project :
-
The open_clip repository.
-
The GeoRSCLIP repository.
-
The SkyCLIP repository.
-
The following medium blog post was very useful for extracting and parsing Wikipedia articles 'Wikipedia Data Science: Working with the World’s Largest Encyclopedia'.
-
Check out 🌮 TACOSS, my previous work on text-based semantic segmentation of aerial images.
Other smaller resources are mentioned in the relevant code sections.






