Skip to content

gaash-lab/GroundedSurg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation

Official repository for GroundedSurg, a grounding-based surgical vision benchmark introduced at MICCAI.


🧠 Overview

GroundedSurg reformulates surgical instrument perception as a language-conditioned, instance-level segmentation task.

Unlike conventional category-level segmentation benchmarks, GroundedSurg requires models to:

  • Resolve natural-language references
  • Disambiguate between multiple similar instruments
  • Perform structured spatial grounding
  • Produce precise pixel-level segmentation masks

πŸ”₯ Key Contributions

  • First language-conditioned surgical grounding benchmark
  • Instance-level disambiguation across multi-instrument scenes
  • Structured spatial grounding:
    • Bounding box
    • Center point
    • Pixel-level mask
  • Multi-procedure diversity
  • Unified evaluation protocol for Vision-Language Models

πŸ“Š Dataset Statistics

Statistic Value
Number of images ~612
Tool annotations ~1,071
Average tools per image ~1.6
Surgical procedures 4
Annotation type Pixel-level segmentation
Spatial grounding Bounding box + Center point
Language descriptions Instance-level

πŸ₯ Covered Procedures

  • Ophthalmic Surgery
  • Laparoscopic Cholecystectomy
  • Robotic Nephrectomy
  • Gastrectomy

🎯 Task Definition

Each benchmark instance consists of:

  • Surgical image I
  • Natural-language query T
  • Bounding box B
  • Center point C
  • Ground-truth segmentation mask M

Objective:

$$f(I, T, B, C) β†’ \hat{M}$$

Models must localize and segment the instrument described by the query.


πŸ“ˆ Evaluation Metrics

Region-Based Metrics

  • IoU
  • IoU@0.5 / IoU@0.9
  • mIoU
  • Dice

Localization Metrics

  • Bounding Box IoU
  • Normalized Distance Error (NDE)

All metrics are computed per image-query pair.


πŸ— Evaluation Pipeline

GroundedSurg follows a unified language-conditioned segmentation protocol:

  1. Vision-Language Model predicts:
    • Bounding box
    • Center point
  2. Predictions projected to segmentation backend (SAM2 / SAM3)
  3. Final mask evaluated against ground truth

πŸ€– Evaluated Models

GroundedSurg benchmarked:

Open-Source Models

  • Qwen2.5-VL
  • Qwen3-VL
  • Gemma 3 (12B / 27B)
  • LLaMA 3 Vision
  • DeepSeek-VL2
  • Mistral 3

Reasoning-Oriented Models

  • VisionReasoner
  • Migician
  • InternVL

Medical-Domain Models

  • MedMO
  • MedGemma
  • MedVLM-R1
  • BiMediX2

Closed-Source Models

  • GPT-4o-mini
  • GPT-5.2

πŸ“ Repository Structure

GroundedSurg/
β”œβ”€β”€ Model_Evaluation_Scripts/
β”‚   β”œβ”€β”€ gemma3.py
β”‚   β”œβ”€β”€ llama.py
β”‚   β”œβ”€β”€ qwen_2_5.py
β”‚   β”œβ”€β”€ qwen_3.py
β”‚   β”œβ”€β”€ mistral_3.py
β”‚   β”œβ”€β”€ intern_eval.py
β”‚   β”œβ”€β”€ med_mo.py
β”‚   β”œβ”€β”€ migician.py
β”‚   β”œβ”€β”€ MedGemma/
β”‚   β”œβ”€β”€ MedVLM-R1/
β”‚   └── Segmentation/
β”‚       β”œβ”€β”€ sam2.py
β”‚       β”œβ”€β”€ sam3.py
β”‚       └── mask_eval.sh
└── Prompt/
    β”œβ”€β”€ prompt1.txt
    └── prompt2.txt

Bencharmark overview

Qualitative Results

πŸ“Έ Qualitative Results

Qualitative Results


πŸš€ Running Evaluation

Example:

python Model_Evaluation_Scripts/qwen_2_5.py

For segmentation backend:

python Model_Evaluation_Scripts/Segmentation/sam3.py

---## πŸ“¦ Installation

⚠️ Note: Different Vision-Language Models may require separate environments depending on their official repositories.
We recommend creating dedicated environments per model when necessary.


1️⃣ Create Conda Environment

conda create -n groundedsurg python=3.10
conda activate groundedsurg

2️⃣ Install Core Dependencies

Install PyTorch (adjust CUDA version if needed):

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Install additional dependencies:

pip install transformers accelerate
pip install opencv-python
pip install numpy scipy tqdm
pip install pillow matplotlib
pip install scikit-image

3️⃣ Install SAM3 Backend

GroundedSurg uses a frozen SAM-based segmentation backend.

Clone the official SAM3 repository:

git clone https://github.com/facebookresearch/sam3

Follow the installation instructions from the official repository.


4️⃣ Download SAM3 Checkpoint

Download the pretrained SAM3 checkpoint from the official repository.

After downloading, update the checkpoint path inside:

Model_Evaluation_Scripts/Segmentation/sam3.py

πŸ”Ž Tested Environment

GroundedSurg has been tested with:

  • Python 3.10
  • PyTorch 2.x
  • CUDA 12.x
  • Ubuntu 22.04

πŸ“Œ Citation

If you use GroundedSurg, please cite:

@inproceedings{groundedsurg2026,
  title={GroundedSurg: A Multi-Procedure Benchmark for Language-Conditioned Surgical Tool Segmentation},
  author={Tajamul Ashraf, Abrar ul Riyz, Wasif Tak , Tavaheed Tariq, Sonia Yadav, Moloud Abdar, Janibul Bashir},
  booktitle={MICCAI},
  year={2026}
}

πŸ“œ License

(To be added)


βœ‰ Contact

Abrar Ul Riyaz
https://abrarulriyaz.vercel.app
Gaash Research Lab
NIT Srinagar

🌐 Website: tobeadded

⭐ If you find this work useful, please consider starring the repository.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors