Training SAM on Custom Data

This project provides tools to fine-tune Meta's Segment Anything Model (SAM2.1) on custom segmentation datasets. The implementation includes dataset preparation, model training, interference scripts, comparison between base model and fine-tuned and testing on new images.

Note: If everything is already set up, jump to [#DatasetPreparation] (Option B) section, as the first step is already done. Just don't forget to activate the conda environment before running the scripts. conda activate sam2_env

FirstStep

Only ignore this step if it's already installed in your system :

Download the Microsoft Visual Studio Installer from Microsoft
1. Open and run the .exe file for installation of Microsoft Visual Studio
2. After installation, modify the installation and include " C++ for desktop development"
Download and install BuildTools for Visual Studio from Microsoft
1. After adding to the Visual Studio Installer, modify the installation and include "C++ for desktop development"
Download and install Anaconda from Anaconda, don't forget to click on the checkbox to add the path to Windows Environments
Download wget.exe from eternallybored
1. After the download, copy the file and paste it into the Windows/Sys32 folder on your drive
Download and install cuda-toolkit from Nvidia
- For RTX 5060 Ti and newer GPUs, install CUDA 12.8 or newer
- For older GPUs, CUDA 11.8 from archive may be sufficient

SecondStep

Environment Setup

(Note that you cannot ignore the [#FirstStep] since SAM2.1 will not run without those programs )

Create and activate a conda environment:

#-n {name_you_want_for_your_env} //you can name it whatever you want
#-y it's for accepting automatically the prompt
conda create -n sam2_env python=3.10 -y

conda activate sam2_env

Install required packages:

# For CUDA 12.x (recommended for RTX 5060 Ti and newer GPUs)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# For CUDA 11.8 (for older GPUs or existing installations)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install opencv-python matplotlib pandas scipy pillow tqdm transformers accelerate pycocotools 

pip install label-studio

pip install scikit-learn

Clone the Github repository to folder/local of choice or download via their provided GUI interface

if by GUI download, just unzip the folder to the location you want
if by using git commands

git clone https://github.com/SaraSSC/CorkDefectAnalizer.git

Or start from scratch

Install sam2

git clone https://github.com/facebookresearch/sam2.git

Move inside the sam2 folder:

cd sam2
pip install -e .

Move inside checkpoints and install the checkpoints

cd checkpoints
download_ckpts.sh #or download_ckpts.bat

Copy the files that are missing from my repository to the sam2 folder, don't forget to keep the same names for them.

DatasetPreparation

Option A: Using Raw Images and Masks

Create a directory structure for your dataset:

mkdir -p ./dataset/images
mkdir -p ./dataset/masks

If you already have image, masks and train.csv files just place then in the ./dataset in their respective folders

./dataset/images for the images
./dataset/masks for their masks
./dataset for the train.csv file

Option B: Converting from Label Studio

If you're using Label Studio for annotations:

Export your project in "COCO with images" format (which includes both annotations and images[in .(your original format)])
Extract the ZIP file from Label Studio to ./sam2/label_studio_exports/

- This will create a JSON file (typically result.json) and an images folder

- Important: Keep the original folder structure - the JSON file and images folder must be in the same directory

Use the prepare_dataset.py script to convert the exports to our format:
- This python file is made to handle coco and pascal voc formats

--format {option: coco or voc}

# For Label Studio's COCO export format
# Point to the JSON file; the script will automatically find the images folder next to it

python create_dataset.py --input /path/to/label_studio_exports/result.json --output_dir ./dataset --format coco

After running the conversion script, you'll have:


dataset/

├── images/              # Images copied from the Label Studio export and converted to .png extension

│   ├── image1.png

│   ├── image2.png

│   └── ...

└── masks/               # Generated binary masks from the polygon annotations

    ├── image1.png

    ├── image2.png

    └── ...

Use the prepare_csv_dataset.py to create the .csv file that contains the mapping A CSV file will map each image to its corresponding segmentation mask, ensuring proper indexing for SAM2 training.

python create_train_csv_dataset.py

Final structure after running:

dataset/

├── images/             

│   ├── image1.png

│   ├── image2.png

│   └── ...

├── masks/              

|   ├── image1.png

|   ├── image2.png

|   └── ...

└── train.csv

To check if the training data is okay for the fine-tuning run:

python analyze_training_data.py

If something isn't right, fixed it by deleting the conversions and train.csv and redo the 3 above commands on the same order (conversion→making the train.csv file→analyzing) before starting the fine-tuning.

Fine-tuning SAM2.1

Run the data_preparation.py script to prepare the dataset for training:

python data_preparation.py

This script will read the images and masks, resize them to 1024x1024, and generate random points on the regions of interest (ROIs) in the masks. The output will be a batch of images, binary masks, and points ready for training. After preparing the dataset, you can fine-tune SAM2.1 using the fine_tune_model_***.py script:

python fine_tune_model_CAWR.py
#or _StepLR -> worse results

This script will load the SAM2.1 model, prepare the dataset, and start training. It will save checkpoints and log training progress. It will also open an image visualisation window to a sample image from the dataset, just to check if the data is being loaded correctly. You need to close it so the model can start the training. You can close it by pressing q or esc or by clicking in the X button.

Inference

To run inference on new images using the fine-tuned model, use the inference_fine_tuned.py script:

python inference_fine_tuned.py

Comparison

To compare the fine tuned model with any of the sam2 base model. Modify the code on lines: - 34→35 to specify the image testing - 51→52 sam2 model checkpoints and model configuration - 79 fine-tuned model

python test_base_vs_finetuned.py

Testing the model on new images (that are not in the dataset)

To test the model on new images, you can use the test_finetune.py script. This script allows you to specify a directory of images and will run inference using the fine-tuned model. The images need to be placed in a folder called test_images inside the sam2 folder, and the script will save the results in a folder called defect_analysis_results inside the sam2 folder.

python test_finetune.py

Additional Information

To use Label Studio type on the cmd and create an account using the gui:

label-studio

It will open a browser window where you can create a project and start annotating images.

Tips for Training SAM

General Guidelines

Data Preparation:

- Ensure your masks are binary (foreground=1, background=0)

- Make sure image and mask sizes match

- Provide diverse examples for better generalisation

Training Parameters:

- Start with a small learning rate (1e-5 to 1e-6)

- Use a relatively small batch size due to model size

- Train for at least 10 epochs to see meaningful improvement

Hardware Requirements:

- SAM2.1-huge requires at least 24GB of GPU memory

- For systems with less memory, consider using a smaller variant

Performance Optimization:

- If training is slow, consider resizing your images to a consistent resolution

- Use mixed precision training by enabling torch.cuda.amp

Annotation Guidelines

Required Annotation Format

SAM training requires binary segmentation masks with these specifications:

File Format:

- PNG files (recommended for lossless compression)

- Each mask must have the same filename as its corresponding image (with .png extension)

- Example: image1.jpg → image1_mask.png image1.png → image1_mask.png

Mask Properties:

- Single-channel (grayscale) images

- Binary values: 0 for background, 255 (or any non-zero value) for foreground

- Same dimensions as the input images

Multiple Objects:

- For multiple objects in one image, you can use separate mask files for each object

- Alternatively, use instance segmentation with different pixel values for each object

Annotation Tools

Here are some recommended tools for creating segmentation masks:

CVAT:

- Free, open-source web-based annotation tool

- Supports polygon, brush, and semi-automatic segmentation

- Can export directly as binary masks

LabelMe:

- Simple Python tool for polygon annotations

- Lightweight and easy to use locally

- Exports as JSON that can be converted to masks

Supervisely:

- Comprehensive platform with free tier

- Advanced annotation features including AI assistance

- Supports various export formats, including masks

Roboflow:

- Good for managing datasets with the free tier

- Pre-processing and augmentation tools

- Can export in various formats

Label Studio:

- Open-source data labelling tool with both cloud and self-hosted options

- Project Setup for SAM Training:

1. When creating a project, choose "Image Segmentation with Polygons" (preferred over brush-based segmentation)

2. Configure your labels for the objects you want to segment

3. Use the polygon tool to create precise boundaries around your objects

- Export Instructions for SAM Training:

1. Go to your project and click "Export"

2. Select "COCO with images" format (recommended), which includes both annotations and image files

3. After export, extract the ZIP file, which will contain a JSON file and an 'images' folder

4. Convert the export to our required format using the converter script.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
sam2		sam2
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training SAM on Custom Data

FirstStep

Only ignore this step if it's already installed in your system :

SecondStep

Environment Setup

Clone the Github repository to folder/local of choice or download via their provided GUI interface

Or start from scratch

DatasetPreparation

Option A: Using Raw Images and Masks

Option B: Converting from Label Studio

Fine-tuning SAM2.1

Inference

Comparison

Testing the model on new images (that are not in the dataset)

Additional Information

Tips for Training SAM

General Guidelines

Annotation Guidelines

Required Annotation Format

Annotation Tools

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SaraSSC/CorkDefectAnalizer

Folders and files

Latest commit

History

Repository files navigation

Training SAM on Custom Data

FirstStep

Only ignore this step if it's already installed in your system :

SecondStep

Environment Setup

Clone the Github repository to folder/local of choice or download via their provided GUI interface

Or start from scratch

DatasetPreparation

Option A: Using Raw Images and Masks

Option B: Converting from Label Studio

Fine-tuning SAM2.1

Inference

Comparison

Testing the model on new images (that are not in the dataset)

Additional Information

Tips for Training SAM

General Guidelines

Annotation Guidelines

Required Annotation Format

Annotation Tools

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages