Akashah Shabbir*, Muhammad Umer Sheikh*, Muhammad Akhtar Munir,Hiyam Debary,Mustansar Fiaz,Muhammad Zaigham Zaheer, Paolo Fraccaro, Fahad Shahbaz Khan, Muhammad Haris Khan, Xiao Xiang Zhu , Salman Khan
Mohamed bin Zayed University of Artificial Intelligence, IBM Research, LinkΓΆping University, Australian National University
*Equal Contribution
OpenEarthAgent is a unified framework for building tool-augmented geospatial agents capable of structured, multi-step reasoning over satellite imagery and GIS data. Designed for remote sensing applications, it integrates multispectral analysis, geospatial operations, and natural-language understanding to enable interpretable, tool-driven decision making. The accompanying dataset contains 14,538 training and 1,169 evaluation instances, with more than 100K reasoning steps in the training split and over 7K reasoning steps in the evaluation split. It covers diverse domains including urban analysis, environmental monitoring, disaster response, and infrastructure assessment, and integrates GIS-based operations with index computations such as NDVI, NBR, and NDBI.
- Feb-20-2025 OpenEarthAgent demo coming soon!
- Feb-20-2025 OpenEarthAgent codebase is released along with evaluation and training scripts.
- Feb-20-2025: π OpenEarthAgent model is released on HuggingFace MBZUAI/OpenEarthAgent
- Feb-20-2025: π Technical Report of OpenEarthAgent paper is released arxiv link.
OpenEarthAgent is a tool-augmented geospatial reasoning framework built on a large language model backbone. The agent decomposes tasks into multi-step trajectories that interleave reasoning and executable tool calls. A unified tool registry standardizes perceptual (e.g., detection, segmentation), GIS (e.g., distance, area, zonal statistics), spectral (e.g., NDVI, NBR, NDBI), and GeoTIFF-based operations under a structured JSON schema. A central orchestrator validates arguments, executes tools, caches intermediate outputs, and appends observations to the working memory, enabling spatially grounded, interpretable reasoning across multimodal EO inputs (RGB, SAR, GIS layers, indices).
The pipeline consists of (1) automated dataset curation, (2) supervised reasoning alignment, and (3) structured evaluation. The dataset integrates optical, SAR, GIS, and multispectral sources into a unified JSON schema containing queries, multimodal inputs, and validated reasoning traces (14,538 train / 1,169 test). Each trajectory is replay-verified to ensure geometric validity and tool correctness. The model is trained via supervised fine-tuning on multi-step tool trajectories, optimizing only tool-action prediction while masking environment outputs. Evaluation is performed in both step-by-step (tool-agnostic reasoning validation) and end-to-end (live tool execution) modes to assess tool selection, argument correctness, trajectory fidelity, and final task accuracy.
The framework is built around two primary modules: the tool server, which provides essential tool-based services, and TF-EVAL, the module responsible for inference and evaluation. Each module has distinct environment dependencies. The tool server must be successfully launched before executing any inference or training processes.
You can launch the tool_server locally.
It requires separate environments for different tool groups to avoid dependency conflicts.
# Create a clean Conda environment
conda create -n tool-server-e1 python=3.10
conda activate tool-server-e1
# Install PyTorch and dependencies (make sure CUDA version matches)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# Install this project
git clone https://github.com/mbzuai-oryx/OpenEarthAgent.git
cd OpenEarthAgent
# Install tool dependencies (SAM2 and Grounding Dino.) and download checkpoint
mkdir models
cd models
pip install -e git+https://github.com/facebookresearch/sam2.git#egg=sam-2
cd src/sam-2/checkpoints
sh download_ckpts.sh
cd ../../..
# please make sure the environment variable CUDA_HOME is set, (export CUDA_HOME=/usr/local/cuda-12.1)
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/
pip install -e . --no-build-isolation
cd ..
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
# Install project requirements
cd ..
conda install -c conda-forge qgis -y
pip install -r ./requirements/tool_server_e1_requirements.txt
pip install -e .This environment isolates change detection dependencies.
#create seperate environment for ChangeDetection tool
conda create -n tool-server-e2 python=3.10
conda activate tool-server-e2
# Install PyTorch and dependencies (make sure CUDA version matches)
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install -r ./requirements/tool_server_e2_requirements.txt
pip install -e .This environment supports LAE-DINO-based object detection tools.
#create seperate environment for ObjectDetection tools
conda create -n tool-server-e3 python=3.10
conda activate tool-server-e3
# Install PyTorch and dependencies (make sure CUDA version matches)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install --extra-index-url https://miropsota.github.io/torch_packages_builder mmcv==2.2.0+pt2.5.1cu121
# Install tool dependencies (LAE-DINO)
cd models
git clone https://github.com/jaychempan/LAE-DINO
cd LAE-DINO/mmdetection_lae
pip install -e . --no-build-isolation
cd ../..
gdown https://drive.google.com/uc?id=1EiR8KtNRYIeOfvtIe9C82cQk_uOMIQ8U
cd ..
pip install -r ./requirements/tool_server_e3_requirements.txt
pip install -e .
#create seperate environment for ChangeDetection tool
This project intentionally uses a lightweight dependency setup to minimize potential version conflicts. Depending on your system configuration, you may need to install additional packages manually if any required components are missing.
Before launching the server, update the configuration file to match your local environment (base paths, model paths, CUDA devices, conda environments, etc.):
tool_server/tool_workers/scripts/launch_scripts/config/all_service_example_local.yaml
For detailed instructions on configuring tools refer to:
## Start all services
conda activate tool-server-e1
cd tool_server/tool_workers/scripts/launch_scripts
python start_server_local.py --config ./config/all_service_example_local.yaml
## Press Ctrl + C to shutdown all services automatically.Examine the log files to ensure the tools are properly configured and to diagnose any potential issues. For tool checking, debugging and some common issues refer to:
Additionally, you can execute tools_test to verify that all tools are functioning properly.
conda activate tool-server-e1
python scripts/tools_test/tools_test.py This section guides you through setting up the environment and running a quick inference/demo using OpenEarthAgent.
We recommend using a clean Conda environment to avoid dependency conflicts.
# Create a clean Conda environment
conda create -n OEA-env python=3.10
conda activate OEA-env
# Install PyTorch and dependencies (make sure CUDA version matches)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# Install this project
pip install -e .
pip install -r requirements/inference_requirements.txt Option 1: Quick Chat Inference (CLI) Run the chat-based inference script:
python scripts/chat/chat.pyThis launches a lightweight command-line interface for interacting with the agent.
Option 2: Interactive Web Demo (Gradio) Start the Gradio application:
python app/app.pyThis will launch a local web interface for interactive experimentation.
The dataset is organized under the following directory structure:
OpenEarthAgent/
βββ data/
βββ train.json
βββ test.json
βββ train_image/
βββ test_image/
βββ gpkgs/
- train.json β Training split containing conversational samples with tool-planning annotations.
- test.json β Evaluation split used for step-by-step and end-to-end assessment.
- train_image/ β Images associated with training samples.
- test_image/ β Images used during evaluation.
- gpkgs/ - Cached Geopackages for evalation
Each JSON file stores structured conversation data that is converted into a chat format during training. Image folders contain the corresponding visual inputs referenced by the dataset entries.
Ensure the data/ directory is correctly configured before launching evaluation or training scripts.
You can evaluate the model using either end-to-end or step-by-step evaluation modes. πΉ End-to-End Evaluation
End-to-end evaluation tests full autonomous execution with live tool use: the model issues tool calls, forms arguments, and reasons iteratively based on tool outputs. This setting measures robustness, argument correctness, and perceptionβaction integration. Run with
sh scripts/eval/eval_e2e.sh
πΉ Step-by-Step Evaluation
Step-by-step evaluation measures procedural reasoning without executing tools: the model generates valid actions over n steps using the full interaction history, with the first step exempt to allow high-level planning. This setting isolates reasoning quality, plan coherence, and geospatial understanding. Run with
sh scripts/eval/eval_step.sh
We provide a Supervised Fine-Tuning (SFT) pipeline to train the model for structured planning, reasoning, and tool invocation. Training is performed with Unsloth using full fine-tuning on a chat-formatted dataset.
# Create a clean Conda environment
conda create -n OEA-train-env python=3.10
conda activate OEA-train-env
# Install PyTorch and dependencies (make sure CUDA version matches)
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
# Install this project
pip install -e .
pip install -r requirements/train_requirements.txt sh scripts/train/train.sh
The training script supports distributed execution and saves checkpoints to the specified output directory.
Please cite the following if you find OpenEarthAgent helpful:
@misc{shabbir2026openearthagent,
title={OpenEarthAgent: A Unified Framework for Tool-Augmented Geospatial Agents},
author={Akashah Shabbir and Muhammad Umer Sheikh and Muhammad Akhtar Munir and Hiyam Debary and Mustansar Fiaz and Muhammad Zaigham Zaheer and Paolo Fraccaro and Fahad Shahbaz Khan and Muhammad Haris Khan and Xiao Xiang Zhu and Salman Khan},
year={2026},
eprint={2602.17665},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.17665},
}






