Phillipp Fanta-Jende · Francesco Vultaggio · Alexander Kern · Yasmin Loeper · Markus Gerke
Paper | Project Page | Dataset
egenioussBench couples a city-scale aerial 3D mesh, a CityGML LoD2 model, and centimetre-accurate smartphone ground-truth poses to benchmark mesh- and object-based localisation under realistic, city-scale conditions.
egenioussBench is a benchmark designed to evaluate visual localisation algorithms that rely on geospatial reference data. The dataset provides:
- A high-resolution aerial 3D mesh reconstructed from oblique imagery
- A CityGML LoD2 building model
- Smartphone query images captured from a Pixel8 with a tightly-coupled INS, providing cm-accurate, map-independent ground truth
- Smartphone pose priors coming from the internal GNSS receiver
The goal is to support research on scalable localisation pipelines that operate at city scale and across different reference representations.
egenioussBench/
├── lod2/ # CityGML LoD2 model of Braunschweig
├── mesh/ # Airborne 3D mesh (7.5 cm GSD)
├── metadata
│ └── camera_parameters.txt
└── query_images
├── test/ # Test images
├── test_android_poses.csv
├── val/ # Validation
├── val_android_poses.txt
└── val_gt_poses.txt
| Split | Purpose | Size | GT Available? |
|---|---|---|---|
| Validation | method development | 412 seq. images | ✓ |
| Test | leaderboard evaluation | 42 non-co-visible images | ✗ |
The test split is explicitly non-co-visible to enforce cold-start localisation. The dataset is available on Zenodo
- Derived from oblique imagery (UltraCam Osprey 4.1)
- ≈1550 m AGL
- 7.5 cm GSD (nadir)
- Georeferencing accuracy ≈1 GSD (XY) / 1.5 GSD (Z)
Provides a realistic, deployable reference model for cross-view localisation.
- Official city model of Braunschweig
- Footprints from cadastral data
- Generalised roof shapes
- Typical corner accuracy ≈10 cm relative to mesh
Represents textureless, low-detail geometry for object-based localisation.
-
2709 RGB images collected in January 2024
-
Resampled to 960×1280 px (~4 cm GSD)
-
PPK + GCP/CP-aided bundle adjustment
-
Final pose accuracy:
- 4 cm (XY) / 7 cm (Z) mean
- 0.04° mean orientation error
The benchmark evaluates 6-DoF camera poses predicted for each query image.
Participants submit a CSV file containing:
# Ground truth poses for validation images
# IMAGE_NAME qw qx qy qz UTMx UTMy alt
We use the same camera coordinate definition as Pix4D, for reference please look at the documentation.
We report:
-
Binned recall at:
- 0.5 m / 2°
- 2 m / 5°
- 5 m / 10°
-
Outliers
-
Median translation error
-
Median rotation error
-
RMSE translation error
-
RMSE rotation error
Mesh-based and LoD2-based methods are evaluated separately.
We provide a lightweight Python evaluation script to self validate on the validation script, the same code will be used to evaluate the test split:
python eval.py \
--pred poses.csv \
--gt val/poses_gt.csv \
--config eval_config.yaml \
--visualize true
--experiment debugSubmissions should be sent by email to egeniouss@ait.ac.at.
Evaluation results will be returned via the same address. Multiple submissions are allowed, but only one submission per day will be evaluated per team.
By submitting, participants grant the organizers permission to publish the resulting scores on the public leaderboard.
We include simple reference baselines demonstrating usage of the dataset. These currently include:
- Mesh-based baseline Based on Meshloc, explains how to
If you use egenioussBench in research, please consider citing:
@article{fanta-jende2025egenioussBench,
title={egenioussBench: A New Dataset for Geospatial Visual Localisation},
author={Fanta-Jende, Phillipp and Vultaggio, Francesco and Kern, Alexander and Loeper, Yasmin and Gerke, Markus},
year={2025}
}
This work is part of the EU-Horizon egeniouss project.