This project provides tools to fine-tune Meta's Segment Anything Model (SAM2.1) on custom segmentation datasets. The implementation includes dataset preparation, model training, interference scripts, comparison between base model and fine-tuned and testing on new images.
Note: If everything is already set up, jump to [#DatasetPreparation] (Option B) section, as the first step is already done. Just don't forget to activate the conda environment before running the scripts. conda activate sam2_env
- Download the Microsoft Visual Studio Installer from Microsoft
- Open and run the .exe file for installation of Microsoft Visual Studio
- After installation, modify the installation and include " C++ for desktop development"
- Download and install BuildTools for Visual Studio from Microsoft
- After adding to the Visual Studio Installer, modify the installation and include "C++ for desktop development"
- Download and install Anaconda from Anaconda, don't forget to click on the checkbox to add the path to Windows Environments
- Download wget.exe from eternallybored
- After the download, copy the file and paste it into the Windows/Sys32 folder on your drive
- Download and install cuda-toolkit from Nvidia
- For RTX 5060 Ti and newer GPUs, install CUDA 12.8 or newer
- For older GPUs, CUDA 11.8 from archive may be sufficient
(Note that you cannot ignore the [#FirstStep] since SAM2.1 will not run without those programs )
- Create and activate a conda environment:
#-n {name_you_want_for_your_env} //you can name it whatever you want
#-y it's for accepting automatically the prompt
conda create -n sam2_env python=3.10 -y
conda activate sam2_env
- Install required packages:
# For CUDA 12.x (recommended for RTX 5060 Ti and newer GPUs)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# For CUDA 11.8 (for older GPUs or existing installations)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install opencv-python matplotlib pandas scipy pillow tqdm transformers accelerate pycocotools
pip install label-studio
pip install scikit-learn
- if by GUI download, just unzip the folder to the location you want
- if by using git commands
git clone https://github.com/SaraSSC/CorkDefectAnalizer.git
- Install sam2
git clone https://github.com/facebookresearch/sam2.git
- Move inside the
sam2folder:
cd sam2
pip install -e .
- Move inside checkpoints and install the checkpoints
cd checkpoints
download_ckpts.sh #or download_ckpts.bat
- Copy the files that are missing from my repository to the
sam2folder, don't forget to keep the same names for them.
Create a directory structure for your dataset:
mkdir -p ./dataset/images
mkdir -p ./dataset/masks
If you already have image, masks and train.csv files just place then in the ./dataset in their respective folders
./dataset/imagesfor the images./dataset/masksfor their masks./datasetfor thetrain.csvfile
If you're using Label Studio for annotations:
-
Export your project in "COCO with images" format (which includes both annotations and images[in .(your original format)])
-
Extract the ZIP file from Label Studio to
./sam2/label_studio_exports/
- This will create a JSON file (typically result.json) and an images folder
- Important: Keep the original folder structure - the JSON file and images folder must be in the same directory
-
Use the
prepare_dataset.pyscript to convert the exports to our format:- This python file is made to handle coco and pascal voc formats
--format {option: coco or voc}
# For Label Studio's COCO export format
# Point to the JSON file; the script will automatically find the images folder next to it
python create_dataset.py --input /path/to/label_studio_exports/result.json --output_dir ./dataset --format coco
After running the conversion script, you'll have:
dataset/
├── images/ # Images copied from the Label Studio export and converted to .png extension
│ ├── image1.png
│ ├── image2.png
│ └── ...
└── masks/ # Generated binary masks from the polygon annotations
├── image1.png
├── image2.png
└── ...
- Use the prepare_csv_dataset.py to create the .csv file that contains the mapping A CSV file will map each image to its corresponding segmentation mask, ensuring proper indexing for SAM2 training.
python create_train_csv_dataset.py
Final structure after running:
dataset/
├── images/
│ ├── image1.png
│ ├── image2.png
│ └── ...
├── masks/
| ├── image1.png
| ├── image2.png
| └── ...
└── train.csv
- To check if the training data is okay for the fine-tuning run:
python analyze_training_data.py
If something isn't right, fixed it by deleting the conversions and train.csv and redo the 3 above commands on the same order (conversion→making the train.csv file→analyzing) before starting the fine-tuning.
Run the data_preparation.py script to prepare the dataset for training:
python data_preparation.py
This script will read the images and masks, resize them to 1024x1024, and generate random points on the regions of interest (ROIs) in the masks. The output will be a batch of images, binary masks, and points ready for training.
After preparing the dataset, you can fine-tune SAM2.1 using the fine_tune_model_***.py script:
python fine_tune_model_CAWR.py
#or _StepLR -> worse results
This script will load the SAM2.1 model, prepare the dataset, and start training. It will save checkpoints and log training progress.
It will also open an image visualisation window to a sample image from the dataset, just to check if the data is being loaded correctly. You need to close it so the model can start the training. You can close it by pressing q or esc or by clicking in the X button.
To run inference on new images using the fine-tuned model, use the inference_fine_tuned.py script:
python inference_fine_tuned.py
To compare the fine tuned model with any of the sam2 base model. Modify the code on lines: - 34→35 to specify the image testing - 51→52 sam2 model checkpoints and model configuration - 79 fine-tuned model
python test_base_vs_finetuned.py
To test the model on new images, you can use the test_finetune.py script. This script allows you to specify a directory of images and will run inference using the fine-tuned model.
The images need to be placed in a folder called test_images inside the sam2 folder, and the script will save the results in a folder called defect_analysis_results inside the sam2 folder.
python test_finetune.py
To use Label Studio type on the cmd and create an account using the gui:
label-studio
It will open a browser window where you can create a project and start annotating images.
- Data Preparation:
- Ensure your masks are binary (foreground=1, background=0)
- Make sure image and mask sizes match
- Provide diverse examples for better generalisation
- Training Parameters:
- Start with a small learning rate (1e-5 to 1e-6)
- Use a relatively small batch size due to model size
- Train for at least 10 epochs to see meaningful improvement
- Hardware Requirements:
- SAM2.1-huge requires at least 24GB of GPU memory
- For systems with less memory, consider using a smaller variant
- Performance Optimization:
- If training is slow, consider resizing your images to a consistent resolution
- Use mixed precision training by enabling torch.cuda.amp
SAM training requires binary segmentation masks with these specifications:
- File Format:
- PNG files (recommended for lossless compression)
- Each mask must have the same filename as its corresponding image (with .png extension)
- Example: image1.jpg → image1_mask.png
image1.png → image1_mask.png
- Mask Properties:
- Single-channel (grayscale) images
- Binary values: 0 for background, 255 (or any non-zero value) for foreground
- Same dimensions as the input images
- Multiple Objects:
- For multiple objects in one image, you can use separate mask files for each object
- Alternatively, use instance segmentation with different pixel values for each object
Here are some recommended tools for creating segmentation masks:
- CVAT:
- Free, open-source web-based annotation tool
- Supports polygon, brush, and semi-automatic segmentation
- Can export directly as binary masks
- Simple Python tool for polygon annotations
- Lightweight and easy to use locally
- Exports as JSON that can be converted to masks
- Comprehensive platform with free tier
- Advanced annotation features including AI assistance
- Supports various export formats, including masks
- Good for managing datasets with the free tier
- Pre-processing and augmentation tools
- Can export in various formats
- Open-source data labelling tool with both cloud and self-hosted options
- Project Setup for SAM Training:
1. When creating a project, choose "Image Segmentation with Polygons" (preferred over brush-based segmentation)
2. Configure your labels for the objects you want to segment
3. Use the polygon tool to create precise boundaries around your objects
- Export Instructions for SAM Training:
1. Go to your project and click "Export"
2. Select "COCO with images" format (recommended), which includes both annotations and image files
3. After export, extract the ZIP file, which will contain a JSON file and an 'images' folder
4. Convert the export to our required format using the converter script.