Skip to content

A Multimodal Vision–Language Framework for UAV-to-Satellite Geo-localization

Notifications You must be signed in to change notification settings

fahad-lateef/UAV-GEOLENS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

UAV-GEOLENS: UAV Geo-localization with Language-ENriched Semantics

This repository contains the official implementation of UAV-GEOLENS, a multimodal UAV geo-localization framework that integrates visual and semantic information, going beyond conventional image-only approaches.

Conceptual overview of UAV-GEOLENS

Description


Code:

  • Source code and training scripts will be released soon
  • Pretrained checkpoints are currently being prepared
  • Full documentation and tutorials are under development

Stay tuned for updates — we’ll release everything.


Dataset:

We trained and evaluated our model using the UL14, DenseUAV, and VPAir datasets.

You can download it from Here

Please follow the dataset’s license and citation terms before use.


Installation:

git clone https://github.com/fahad-lateef/UAV-GEOLENS.git
cd UAV-GEOLENS 

(Optional) Create and activate a virtual environment
python -m venv UAV-GEOLENS
source venv/bin/activate  # on Windows: venv\Scripts\activate

Install dependencies
pip install -r requirements.txt

Usage:

Once released, you’ll be able to:

Train a model

python train.py --config configs/geolens_config.yaml

Evaluate

python eval.py --checkpoint path/to/checkpoint.pth --dataset path/to/data 

Experimental Results:

Qualitative UAV-to-satellite retrieval results

Description

Quantitative results of UAV-GEOLENS on UL14, DenseUAV, and VPAir datasets using Recall@1, 5, 10 metrics.

Description

Two qualitative examples illustrating the impact of semantic descriptions on cross-view matching. For each UAV query, the retrieved satellite image from the database shares consistent semantic elements such as buildings, roof structure, vegetation, and road layout. The captions independently highlight these common features, reinforcing the visual correspondence and improving retrieval accuracy.

Description


Cite this work:

When using or referring to our work, please consider citing our Paper:


Contributing:


Contributions are welcome! If you’d like to report a bug, request a feature, or contribute code, please open an issue or pull request..


Contact:

For questions or collaborations, reach out at:

fahad.lateef@utbm.fr mohamed.kas@utbm.fr


Acknowledgements

This study was supported by ANR/Institut Carnot ARTS under the TECTONIC project.

We would like to thank CIAD-UTBM for their support and resources:

Org 1      Org 2


License

This project is released under the______ License

About

A Multimodal Vision–Language Framework for UAV-to-Satellite Geo-localization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors