The paper "CLIP-Vision Guided Few-Shot Metal Surface Defect Recognition" has been published in IEEE Transactions on Industrial Informatics.
Metal surface defect recognition (MSDR) based on deep learning encounters the challenge of Few-Shot expert-labeled data. In this study, we proposed a CLIP-Vision Guided Self Supervised Learning (CVGSSL) framework for representation learning of unlabeled data, completing MSDR using Few-Shot labeled data. This framework initially generates rich and diverse representation information through multiple CLIP-Vs to ensure effective SSL pre-training, followed by the design of an MLP-Adapter to distill knowledge and adapt these representations to recognition tasks. Additionally, we constructed a self-constrained loss to address the inherent problem of intra-class and inter-class distance ambiguity that causes the representation to fall into an equivocal decision margin. Following label-free pre-training of CVGSSL, the downstream model adapts to 1-shot to 4-shot defect recognition tasks through fine-tuning.
⚠️ Note: Please manually download the CLIP checkpoint (e.g.,RN50.pt
) and place it in theckp/
folder. Also, create necessary folders likeweight/
andresults/
beforehand, as they are not auto-generated.
pip install torch torchvision timm numpy
pip install git+https://github.com/openai/CLIP.git
python train.py
python finetune_lincls.py
Training logs are automatically saved under logs/
. Each run logs per-epoch loss and accuracy for training and testing phases:
[Run 1] Epoch [1/100] Train Loss: 1.2593, Acc: 63.42% | Val Loss: 0.9341, Acc: 78.01%
...
[Run 1] Best Val Acc: 81.53%
Expected directory structure:
/data_root/
├── train/
│ ├── class1/
│ │ ├── img1.jpg
│ │ └── ...
│ ├── class2/
│ └── ...
└── test/
├── class1/
├── class2/
└── ...
- Support for ViT and ResNet backbones
- Easy integration of pretrained models (e.g., CLIP, MoCo)
- Few-shot training and evaluation
- Separate training modes: linear probing and full finetuning
- Configurable optimizer, learning rate, and more
The datasets used in this project are derived from well-established benchmarks and studies in the field of industrial surface defect detection:
-
A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects
-
X-SDD: A new benchmark for hot rolled steel strip surface defects detection
-
Deep metallic surface defect detection: The new benchmark and detection network
This repository is inspired by and partially built upon: