We plan to create a very interesting demo by combining Grounding DINO and Segment Anything which aims to detect and segment anything with text inputs! And we will continue to improve it and create more interesting demos based on this foundation. And we have already released an overall technical report about our project on arXiv, please check Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks for more details.
- 🔥 Grounded SAM 2 is released now, which combines Grounding DINO with SAM 2 for any object tracking in open-world scenarios.
- 🔥 Grounding DINO 1.5 is released now, which is IDEA Research's Most Capable Open-World Object Detection Model!
- 🔥 Grounding DINO and Grounded SAM are now supported in Huggingface. For more convenient use, you can refer to this documentation
We are very willing to help everyone share and promote new projects based on Segment-Anything, Please check out here for more amazing demos and works in the community: Highlight Extension Projects. You can submit a new issue (with project
tag) or a new pull request to add new project's links.
🍄 Why Building this Project?
The core idea behind this project is to combine the strengths of different models in order to build a very powerful pipeline for solving complex problems. And it's worth mentioning that this is a workflow for combining strong expert models, where all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT).
The code requires python>=3.8
, as well as pytorch>=1.7
and torchvision>=0.8
. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.
Open one terminal:
make build-image
make run
That's it.
If you would like to allow visualization across docker container, open another terminal and type:
xhost +
You should set the environment variable manually as follows if you want to build a local GPU environment for Grounded-SAM:
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-12.1/
Create environment on G2:
conda create -n groundedsam python=3.10
Install CUDA and Pytorch. Remember to change the cuda path to your own path.
conda install -y nvidia/label/cuda-12.1.0::cuda
conda install -y nvidia/label/cuda-12.1.0::cuda-cudart
conda env config vars set PATH=/home/hy648/.conda/envs/groundedsam/bin:$PATH -n groundedsam
conda env config vars set LD_LIBRARY_PATH=/home/hy648/.conda/envs/groundedsam/lib64:$LD_LIBRARY_PATH -n groundedsam
conda env config vars set CUDA_HOME=/home/hy648/.conda/envs/groundedsam -n groundedsam
conda env config vars set CPATH=/home/hy648/.conda/envs/groundedsam/targets/x86_64-linux/include:$CPATH -n groundedsam
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
Install Segment Anything:
python -m pip install -e segment_anything
Install Grounding DINO:
pip install --no-build-isolation -e GroundingDINO
The following optional dependencies are necessary for mask post-processing, saving masks in COCO format, the example notebooks, and exporting the model in ONNX format. jupyter
is also required to run the example notebooks.
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel
More details can be found in install segment anything and install GroundingDINO and install OSX
Here's the step-by-step tutorial on running Grounded-SAM
demo:
Step 1: Download the pretrained weights
cd Grounded-Segment-Anything
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
Step 2: Running original grounded-sam demo
# depends on your device
export CUDA_VISIBLE_DEVICES=0
python grounded_sam_depth.py \
--config GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
--grounded_checkpoint groundingdino_swint_ogc.pth \
--sam_checkpoint sam_vit_h_4b8939.pth \
--input_image assets/basket2.jpg \
--input_depth assets/depth.jpg \
--output_dir "outputs" \
--box_threshold 0.3 \
--text_threshold 0.25 \
--text_prompt "basket" \
--device "cuda"
The annotated results will be saved in ./outputs
as follows