Skip to content

Commit 83c49e6

Browse files
authored
[飞桨论文复现挑战赛(第六期)] (102) Dense Contrastive Learning for Self-Supervised Visual Pre-Training (#118)
* add densecl * add doc * add tipc * fix bug in tipc * update doc & densecl result
1 parent 1fc341a commit 83c49e6

File tree

20 files changed

+947
-6
lines changed

20 files changed

+947
-6
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ PASSL implements a series of self-supervised learning algorithms, See **Document
4747
| BYOL | 300 | 72.50 | 71.62 | ResNet-50 | [download](https://passl.bj.bcebos.com/models/byol_r50_300.pdparams) | [Train BYOL](docs/Train_BYOL_model.md) |
4848
| PixPro | 100 | 55.1(fp16) | 57.2(fp32) | ResNet-50 | [download](https://passl.bj.bcebos.com/models/pixpro_r50_ep100_no_instance_with_linear.pdparams) | [Train PixPro](docs/Train_PixPro_model.md) |
4949
| SimSiam | 100 | 68.3 | 68.4 | ResNet-50 | [download](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing) | [Train SimSiam](docs/Train_SimSiam_model.md) |
50+
| DenseCL | 200 | 63.62 | 63.37 | ResNet-50 | [download](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing) | [Train DenseCL](docs/Train_DenseCL_model.md) |
5051
| SwAV | 100 | 72.1 | 72.4 | ResNet-50 | [download](https://drive.google.com/file/d/1budFSoQqZz1Idyej-R4E6kUnL8CGtdyu/view?usp=sharing) | [Train SwAV](docs/Train_SwAV_model.md) |
5152

5253
> Benchmark Linear Image Classification on ImageNet-1K.

README_cn.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ PASSL 实现了一系列自监督学习算法,更具体的使用文档请参
4747
| BYOL | 300 | 72.50 | 71.62 | ResNet-50 | [download](https://passl.bj.bcebos.com/models/byol_r50_300.pdparams) | [Train BYOL](docs/Train_BYOL_model.md) |
4848
| PixPro | 100 | 55.1(fp16) | 57.2(fp32) | ResNet-50 | [download](https://passl.bj.bcebos.com/models/pixpro_r50_ep100_no_instance_with_linear.pdparams) | [Train PixPro](docs/Train_PixPro_model.md) |
4949
| SimSiam | 100 | 68.3 | 68.4 | ResNet-50 | [download](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing) | [Train SimSiam](docs/Train_SimSiam_model.md) |
50+
| DenseCL | 200 | 63.62 | 63.37 | ResNet-50 | [download](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing) | [Train PixPro](docs/Train_DenseCL_model.md) |
5051
| SwAV | 100 | 72.1 | 72.4 | ResNet-50 | [download](https://drive.google.com/file/d/1budFSoQqZz1Idyej-R4E6kUnL8CGtdyu/view?usp=sharing) | [Train SwAV](docs/Train_SwAV_model.md) |
5152

5253
> Benchmark Linear Image Classification on ImageNet-1K.

configs/densecl/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
[简体中文](README_ch.md) | English
2+
3+
# Dense Contrastive Learning for Self-Supervised Visual Pre-Training ([arxiv](https://arxiv.org/abs/2011.09157))
4+
5+
## Introduction
6+
7+
To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation.
8+
9+
<p align="center">
10+
<img src="../../docs/imgs/densecl.png" width="100%" height="100%"/>
11+
</p>
12+
13+
14+
## Getting Started
15+
16+
### 1. Train DenseCL
17+
18+
#### single gpu
19+
```
20+
python tools/train.py -c configs/densecl/densecl_r50.yaml
21+
```
22+
23+
#### multiple gpus
24+
25+
```
26+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/densecl/densecl_r50.yaml
27+
```
28+
29+
Pretraining models with 200 epochs can be found at [densecl](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing)
30+
31+
Note: The default learning rate in config files is for 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following ```new_lr = old_lr * new_ngpus / old_ngpus```.
32+
33+
### 2. Extract backbone weights
34+
35+
```
36+
python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --remove_prefix
37+
```
38+
39+
### 3. Evaluation on ImageNet Linear Classification
40+
41+
#### Train:
42+
```
43+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --pretrained ${WEIGHT_FILE}
44+
```
45+
46+
#### Evaluate:
47+
```
48+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
49+
```
50+
51+
The trained linear weights in conjuction with the backbone weights can be found at [densecl linear](https://drive.google.com/file/d/1XJeDY8clKfhUeXw4JcCa1QgG2G-Ibr4m/view?usp=sharing)
52+
53+
## Reference
54+
55+
```
56+
@inproceedings{wang2021dense,
57+
title={Dense contrastive learning for self-supervised visual pre-training},
58+
author={Wang, Xinlong and Zhang, Rufeng and Shen, Chunhua and Kong, Tao and Li, Lei},
59+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
60+
pages={3024--3033},
61+
year={2021}
62+
}
63+
```

configs/densecl/README_ch.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
简体中文 | [English](README.md)
2+
3+
# Dense Contrastive Learning for Self-Supervised Visual Pre-Training ([arxiv](https://arxiv.org/abs/2011.09157))
4+
5+
## 简介
6+
7+
迄今为止,大多数现有的自我监督学习方法都是针对图像分类设计和优化的。由于图像级预测和像素级预测之间的差异,这些预训练模型对于密集预测任务可能不是最佳的。为了填补这一空白,我们的目标是设计一种有效的、密集的自监督学习方法,该方法通过考虑局部特征之间的对应关系,直接在像素(或局部特征)级别上进行对比学习。具体来说,我们提出了密集对比学习,它通过优化输入图像的两个视图之间的像素级别的成对对比(不)相似性损失来实现自我监督学习。与基线方法 MoCo-v2 相比,我们的方法引入的计算开销可以忽略不计(仅慢 <1%),但在转移到下游密集预测任务(包括对象检测、语义分割和实例分割)时表现出了一致的卓越性能,大大优于最先进的方法。具体来说,在强大的 MoCo-v2 基线上,我们的方法在 PASCAL VOC 对象检测上实现了 2.0% AP、COCO 对象检测上 1.1% AP、COCO 实例分割上 0.9% AP、PASCAL VOC 语义分割上 3.0% mIoU 和 3.0% mIoU 以及Cityscapes 语义分割 1.8% mIoU的显著改进。
8+
9+
<p align="center">
10+
<img src="../../docs/imgs/densecl.png" width="100%" height="100%"/>
11+
</p>
12+
13+
14+
15+
## 快速开始
16+
17+
### 1. 训练DenseCL
18+
19+
单卡训练
20+
21+
```bash
22+
python tools/train.py -c configs/densecl/densecl_r50.yaml
23+
```
24+
25+
多卡训练
26+
27+
```bash
28+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/densecl/densecl_r50.yaml
29+
```
30+
31+
200 个 epoch 的预训练模型权重:[densecl](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing)
32+
33+
### 2. 提取 backbone 权重
34+
35+
```bash
36+
python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --remove_prefix
37+
```
38+
39+
### 3. ImageNet 线性分类评估
40+
41+
训练
42+
43+
```bash
44+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --pretrained ${WEIGHT_FILE}
45+
```
46+
47+
评估
48+
49+
```bash
50+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
51+
```
52+
53+
主干网络以及线性权重:[densecl linear](https://drive.google.com/file/d/1XJeDY8clKfhUeXw4JcCa1QgG2G-Ibr4m/view?usp=sharing)
54+
55+
### 参考
56+
57+
```
58+
@inproceedings{wang2021dense,
59+
title={Dense contrastive learning for self-supervised visual pre-training},
60+
author={Wang, Xinlong and Zhang, Rufeng and Shen, Chunhua and Kong, Tao and Li, Lei},
61+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
62+
pages={3024--3033},
63+
year={2021}
64+
}
65+
```

configs/densecl/densecl_r50.yaml

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
epochs: 200
2+
output_dir: output_dir
3+
seed: 0
4+
device: gpu
5+
6+
# used for static mode and model export
7+
image_shape: [3, 224, 224]
8+
save_inference_dir: ./inference
9+
10+
model:
11+
name: DenseCL
12+
backbone:
13+
name: ResNet
14+
depth: 50
15+
neck:
16+
name: DenseCLNeck
17+
in_channels: 2048
18+
hid_channels: 2048
19+
out_channels: 128
20+
num_grid: None
21+
head:
22+
name: ContrastiveHead
23+
temperature: 0.2
24+
return_accuracy: False
25+
26+
dataloader:
27+
train:
28+
loader:
29+
num_workers: 8
30+
use_shared_memory: True
31+
sampler:
32+
batch_size: 32
33+
shuffle: true
34+
drop_last: true
35+
dataset:
36+
name: ImageNet
37+
dataroot: data/ILSVRC2012/train
38+
return_label: False
39+
return_two_sample: True
40+
transforms:
41+
- name: RandomResizedCrop
42+
size: 224
43+
scale: [0.2, 1.]
44+
view_trans1:
45+
- name: RandomApply
46+
transforms:
47+
- name: ColorJitter
48+
brightness: 0.4
49+
contrast: 0.4
50+
saturation: 0.4
51+
hue: 0.1
52+
p: 0.8
53+
- name: RandomGrayscale
54+
p: 0.2
55+
- name: RandomApply
56+
transforms:
57+
- name: GaussianBlur
58+
sigma: [0.1, 2.0]
59+
p: 0.5
60+
- name: RandomHorizontalFlip
61+
- name: Transpose
62+
- name: NormalizeImage
63+
scale: 1.0/255.0
64+
mean: [0.485, 0.456, 0.406]
65+
std: [0.229, 0.224, 0.225]
66+
view_trans2:
67+
- name: RandomApply
68+
transforms:
69+
- name: ColorJitter
70+
brightness: 0.4
71+
contrast: 0.4
72+
saturation: 0.4
73+
hue: 0.1
74+
p: 0.8
75+
- name: RandomGrayscale
76+
p: 0.2
77+
- name: RandomApply
78+
transforms:
79+
- name: GaussianBlur
80+
sigma: [0.1, 2.0]
81+
p: 0.5
82+
- name: RandomHorizontalFlip
83+
- name: Transpose
84+
- name: NormalizeImage
85+
scale: 1.0/255.0
86+
mean: [0.485, 0.456, 0.406]
87+
std: [0.229, 0.224, 0.225]
88+
89+
lr_scheduler:
90+
name: CosineAnnealingDecay
91+
learning_rate: 0.03
92+
T_max: 200
93+
94+
optimizer:
95+
name: Momentum
96+
weight_decay: 0.0001
97+
98+
log_config:
99+
name: LogHook
100+
interval: 50

configs/simsiam/README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
[简体中文](README_ch.md) | English
2+
3+
# Exploring Simple Siamese Representation Learning ([arxiv](https://arxiv.org/abs/2011.10566))
4+
5+
## Introduction
6+
7+
Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning.
8+
9+
<p align="center">
10+
<img src="../../docs/imgs/simsiam.png" width="60%" height="60%"/>
11+
</p>
12+
13+
## Getting Started
14+
15+
### 1. Train SimSiam
16+
17+
#### single gpu
18+
```
19+
python tools/train.py -c configs/simsiam/simsiam_r50.yaml
20+
```
21+
22+
#### multiple gpus
23+
24+
```
25+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_r50.yaml
26+
```
27+
28+
Pretraining models with 100 epochs can be found at [simsiam](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing)
29+
30+
Note: The default learning rate in config files is for 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following ```new_lr = old_lr * new_ngpus / old_ngpus```.
31+
32+
### 2. Extract backbone weights
33+
34+
```
35+
python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --prefix encoder --remove_prefix
36+
```
37+
38+
### 3. Evaluation on ImageNet Linear Classification
39+
40+
#### Train:
41+
```
42+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --pretrained ${WEIGHT_FILE}
43+
```
44+
45+
#### Evaluate:
46+
```
47+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
48+
```
49+
50+
The trained linear weights in conjuction with the backbone weights can be found at [simsiam linear](https://drive.google.com/file/d/19smHZGhBEPWeyLjKIGhM7KPngr-8BOUl/view?usp=sharing)
51+
52+
## Reference
53+
54+
```
55+
@inproceedings{chen2021exploring,
56+
title={Exploring simple siamese representation learning},
57+
author={Chen, Xinlei and He, Kaiming},
58+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
59+
pages={15750--15758},
60+
year={2021}
61+
}
62+
```

configs/simsiam/README_ch.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
简体中文 | [English](README.md)
2+
3+
# Exploring Simple Siamese Representation Learning ([arxiv](https://arxiv.org/abs/2011.10566))
4+
5+
## 简介
6+
7+
孪生网络已成为最近各种无监督视觉表示学习模型中的常见结构。这些模型最大限度地提高了一幅图像的两个增强之间的相似性,受制于避免崩溃解决方案的某些条件。在本文中,我们报告了令人惊讶的经验结果,即简单的孪生网络即使不使用以下任何一种,也可以学习有意义的表示:(i)负样本对;(ii)大批量;(iii)动量编码器。我们的实验表明,对于损失和结构确实存在坍塌解决方案,但停止梯度操作在防止坍塌方面起着至关重要的作用。我们提供了关于停止梯度的含义的假设,并进一步展示了验证它的概念验证实验。我们的SimSiam方法在 ImageNet 和下游任务上取得了有竞争力的结果。我们希望这个简单的基线能够激发人们重新思考孪生架构在无监督表示学习中的作用。
8+
9+
<p align="center">
10+
<img src="../../docs/imgs/simsiam.png" width="60%" height="60%"/>
11+
</p>
12+
13+
14+
## 快速开始
15+
16+
### 1. 训练SimSiam
17+
18+
单卡训练
19+
20+
```bash
21+
python tools/train.py -c configs/simsiam/simsiam_r50.yaml
22+
```
23+
24+
多卡训练
25+
26+
```bash
27+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_r50.yaml
28+
```
29+
30+
100 个 epoch 的预训练模型权重: [simsiam](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing)
31+
32+
### 2. 提取 backbone 权重
33+
34+
```bash
35+
python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --prefix encoder --remove_prefix
36+
```
37+
38+
### 3. ImageNet 线性分类评估
39+
40+
训练
41+
42+
```bash
43+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --pretrained ${WEIGHT_FILE}
44+
```
45+
46+
评估
47+
48+
```bash
49+
python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
50+
```
51+
52+
主干网络以及线性权重:[simsiam linear](https://drive.google.com/file/d/19smHZGhBEPWeyLjKIGhM7KPngr-8BOUl/view?usp=sharing)
53+
54+
### 参考
55+
56+
```
57+
@inproceedings{chen2021exploring,
58+
title={Exploring simple siamese representation learning},
59+
author={Chen, Xinlei and He, Kaiming},
60+
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
61+
pages={15750--15758},
62+
year={2021}
63+
}
64+
```

0 commit comments

Comments
 (0)