ISRO-GeoNLI

A production-ready pipeline for remote sensing image analysis using Vision Language Models (VLM) and SAM3:

Image Captioning: Generate descriptions of aerial/satellite imagery
Object Grounding: Detect and localize objects with oriented bounding boxes
Visual Question Answering: Answer numeric, binary, and semantic questions

Repository Structure

ISRO-GeoNLI/
├── app_prod.py              # Production FastAPI server (preloads models)
├── app_dev.py               # Development server (lazy loading)
├── rs_pipeline.py           # Main RSPipeline class
├── api_helpers.py           # API utilities
├── api_models.py            # Pydantic schemas
│
├── model/                   # Model interfaces
│   ├── model_builder.py     # VLM & SAM3 initialization
│   ├── vlm_interface.py     # Qwen3-VL interface
│   └── sam3_interface.py    # SAM3 interface
│
├── tasks/                   # Task handlers
│   ├── captioning.py        # Image captioning
│   ├── grounding.py         # Object detection
│   └── vqa.py               # Visual QA
│
├── utils/                   # Utilities
│   ├── geo_calc.py          # Geometric calculations
│   ├── visualization.py     # Annotation tools
│   └── vqa_output_normalizer.py
│
├── Evaluation/              # Evaluation scripts
├── Finetuning_runs/         # Training scripts
├── website-backend/         # Web API backend
└── website-frontend/        # React frontend

Quick Start

Prerequisites

Python 3.10+
CUDA-capable GPU (36GB+ VRAM recommended)
HuggingFace account with SAM 3 access

Installation

# Clone repository
cd ISRO-GeoNLI

# Create environment
conda create -n isro_geonli python=3.10
conda activate isro_geonli

# Install PyTorch (CUDA 12.4)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124

# Install dependencies
pip install -r requirements.txt

# HuggingFace authentication
huggingface-cli login

Run Production Server

# Linux/Mac
./run_prod.sh

# Windows
uvicorn app_prod:app --host 0.0.0.0 --port 8080

The server preloads models on startup (takes 2-3 minutes) and runs at http://localhost:8080.

Test API

# Health check
curl http://localhost:8080/health

# Simple query
curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the area of the building?", "image_url": "https://example.com/image.jpg"}'

API Endpoints

POST /process

Structured request matching query.json schema. Please use this endpoint for evaluation

Request:

{
  "input_image": {
    "image_id": "sample_001",
    "image_url": "https://example.com/image.jpg",
    "metadata": {"spatial_resolution_m": 1.57}
  },
  "queries": {
    "caption_query": {"instruction": "Describe the image."},
    "grounding_query": {"instruction": "Locate all buildings."},
    "attribute_query": {
      "binary": {"instruction": "Is there any aeroplane?"},
      "numeric": {"instruction": "What is the area?"},
      "semantic": {"instruction": "What color is the building?"}
    }
  }
}

POST /query

Auto-classifies query type using LLM. This is used internally in the chat app.

Request:

{
  "query": "Count the cars in the parking lot",
  "image_url": "https://example.com/parking.jpg"
}

Please refer report (Team_46.pdf) for the architecture details.

License

Pipeline Code: MIT License
Qwen3-VL: Apache 2.0
SAM3: SAM License

Citation

@article{qwen3vl2024,
  title={Qwen3-VL: Towards Versatile Vision-Language Understanding},
  author={Qwen Team},
  year={2024}
}

@article{sam3,
  title={Segment Anything Model 3},
  author={Meta AI Research},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISRO-GeoNLI

Repository Structure

Quick Start

Prerequisites

Installation

Run Production Server

Test API

API Endpoints

POST /process

POST /query

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
Evaluation		Evaluation
Finetuning_runs		Finetuning_runs
Inference		Inference
Pretraining_runs		Pretraining_runs
XLRS dataset curation		XLRS dataset curation
model		model
scripts		scripts
tasks		tasks
utils		utils
website-backend		website-backend
website-frontend		website-frontend
.gitignore		.gitignore
api_helpers.py		api_helpers.py
api_models.py		api_models.py
app_prod.py		app_prod.py
pipeline.py		pipeline.py
readme.md		readme.md
requirements.txt		requirements.txt
rs_pipeline.py		rs_pipeline.py
run_prod.sh		run_prod.sh
sample_image.png		sample_image.png
sample_query.json		sample_query.json
sample_response.json		sample_response.json
test_api.py		test_api.py

Folders and files

Latest commit

History

Repository files navigation

ISRO-GeoNLI

Repository Structure

Quick Start

Prerequisites

Installation

Run Production Server

Test API

API Endpoints

POST /process

POST /query

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages