Skip to content

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

License

Notifications You must be signed in to change notification settings

haozheng-yu/Grounded-Segment-Anything

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Grounded-Segment-Anything

YouTube Colab Open in Colab HuggingFace Space Replicate ModelScope Official Demo Huggingface Demo by Community Stable-Diffusion WebUI Jupyter Notebook Demo Static Badge Static Badge Static Badge

We plan to create a very interesting demo by combining Grounding DINO and Segment Anything which aims to detect and segment anything with text inputs! And we will continue to improve it and create more interesting demos based on this foundation. And we have already released an overall technical report about our project on arXiv, please check Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks for more details.

We are very willing to help everyone share and promote new projects based on Segment-Anything, Please check out here for more amazing demos and works in the community: Highlight Extension Projects. You can submit a new issue (with project tag) or a new pull request to add new project's links.

🍄 Why Building this Project?

The core idea behind this project is to combine the strengths of different models in order to build a very powerful pipeline for solving complex problems. And it's worth mentioning that this is a workflow for combining strong expert models, where all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT).

Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Install with Docker

Open one terminal:

make build-image
make run

That's it.

If you would like to allow visualization across docker container, open another terminal and type:

xhost +

Install without Docker (Haozheng's setup on G2)

You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:

export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-12.1/

Create environment on G2:

conda create -n groundedsam python=3.10

Install CUDA and Pytorch. Remember to change the cuda path to your own path.

conda install -y nvidia/label/cuda-12.1.0::cuda
conda install -y nvidia/label/cuda-12.1.0::cuda-cudart
conda env config vars set PATH=/home/hy648/.conda/envs/groundedsam/bin:$PATH -n groundedsam
conda env config vars set LD_LIBRARY_PATH=/home/hy648/.conda/envs/groundedsam/lib64:$LD_LIBRARY_PATH -n groundedsam
conda env config vars set CUDA_HOME=/home/hy648/.conda/envs/groundedsam -n groundedsam
conda env config vars set CPATH=/home/hy648/.conda/envs/groundedsam/targets/x86_64-linux/include:$CPATH -n groundedsam
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

Install Segment Anything:

python -m pip install -e segment_anything

Install Grounding DINO:

pip install --no-build-isolation -e GroundingDINO

The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter is also required to run the example notebooks.

pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

More details can be found in install segment anything and install GroundingDINO and install OSX

Grounded-SAM: Detect and Segment Everything with Text Prompt

Here's the step-by-step tutorial on running Grounded-SAM demo:

Step 1: Download the pretrained weights

cd Grounded-Segment-Anything

wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Step 2: Running original grounded-sam demo

# depends on your device 
export CUDA_VISIBLE_DEVICES=0
python grounded_sam_depth.py \
  --config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
  --grounded_checkpoint groundingdino_swint_ogc.pth \
  --sam_checkpoint sam_vit_h_4b8939.pth \
  --input_image assets/basket2.jpg \
  --input_depth assets/depth.jpg \
  --output_dir "outputs" \
  --box_threshold 0.3 \
  --text_threshold 0.25 \
  --text_prompt "basket" \
  --device "cuda"

The annotated results will be saved in ./outputs as follows

Input Image Annotated Image Generated Mask

About

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.4%
  • Python 4.3%
  • Cuda 0.3%
  • C++ 0.0%
  • Makefile 0.0%
  • Dockerfile 0.0%