PaddlePaddle
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README_cn.md‎
Lines changed: 1 addition & 0 deletions b/‎README_cn.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎configs/densecl/README.md‎
Lines changed: 63 additions & 0 deletions b/‎configs/densecl/README.md‎
Lines changed: 63 additions & 0 deletions
diff --git a/‎configs/densecl/README_ch.md‎
Lines changed: 65 additions & 0 deletions b/‎configs/densecl/README_ch.md‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎configs/densecl/densecl_r50.yaml‎
Lines changed: 100 additions & 0 deletions b/‎configs/densecl/densecl_r50.yaml‎
Lines changed: 100 additions & 0 deletions
diff --git a/‎configs/simsiam/README.md‎
Lines changed: 62 additions & 0 deletions b/‎configs/simsiam/README.md‎
Lines changed: 62 additions & 0 deletions
diff --git a/‎configs/simsiam/README_ch.md‎
Lines changed: 64 additions & 0 deletions b/‎configs/simsiam/README_ch.md‎
Lines changed: 64 additions & 0 deletions
@@ -47,6 +47,7 @@ PASSL implements a series of self-supervised learning algorithms, See **Document
 | BYOL      | 300    | 72.50            | 71.62         | ResNet-50 | [download](https://passl.bj.bcebos.com/models/byol_r50_300.pdparams) | [Train BYOL](docs/Train_BYOL_model.md)           |
 | PixPro    | 100    | 55.1(fp16)       | 57.2(fp32)    | ResNet-50 | [download](https://passl.bj.bcebos.com/models/pixpro_r50_ep100_no_instance_with_linear.pdparams) | [Train PixPro](docs/Train_PixPro_model.md)       |
 | SimSiam   | 100    | 68.3             | 68.4          | ResNet-50 | [download](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing) | [Train SimSiam](docs/Train_SimSiam_model.md)     |
+| DenseCL   | 200    | 63.62            | 63.37         | ResNet-50 | [download](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing) | [Train DenseCL](docs/Train_DenseCL_model.md)     |
 | SwAV      | 100    | 72.1             | 72.4          | ResNet-50 | [download](https://drive.google.com/file/d/1budFSoQqZz1Idyej-R4E6kUnL8CGtdyu/view?usp=sharing) | [Train SwAV](docs/Train_SwAV_model.md)           |
 
 > Benchmark Linear Image Classification on ImageNet-1K.
 
@@ -47,6 +47,7 @@ PASSL 实现了一系列自监督学习算法，更具体的使用文档请参
 | BYOL      | 300    | 72.50            | 71.62         | ResNet-50 | [download](https://passl.bj.bcebos.com/models/byol_r50_300.pdparams) | [Train BYOL](docs/Train_BYOL_model.md)           |
 | PixPro    | 100    | 55.1(fp16)       | 57.2(fp32)    | ResNet-50 | [download](https://passl.bj.bcebos.com/models/pixpro_r50_ep100_no_instance_with_linear.pdparams) | [Train PixPro](docs/Train_PixPro_model.md)       |
 | SimSiam   | 100    | 68.3             | 68.4          | ResNet-50 | [download](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing) | [Train SimSiam](docs/Train_SimSiam_model.md)     |
+| DenseCL   | 200    | 63.62            | 63.37         | ResNet-50 | [download](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing) | [Train PixPro](docs/Train_DenseCL_model.md)      |
 | SwAV      | 100    | 72.1             | 72.4          | ResNet-50 | [download](https://drive.google.com/file/d/1budFSoQqZz1Idyej-R4E6kUnL8CGtdyu/view?usp=sharing) | [Train SwAV](docs/Train_SwAV_model.md)           |
 
 > Benchmark Linear Image Classification on ImageNet-1K.
 
@@ -0,0 +1,63 @@
+[简体中文](README_ch.md) | English
+
+# Dense Contrastive Learning for Self-Supervised Visual Pre-Training ([arxiv](https://arxiv.org/abs/2011.09157))
+
+## Introduction
+
+To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1% slower), but demonstrates consistently superior performance when transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0% AP on PASCAL VOC object detection, 1.1% AP on COCO object detection, 0.9% AP on COCO instance segmentation, 3.0% mIoU on PASCAL VOC semantic segmentation and 1.8% mIoU on Cityscapes semantic segmentation.
+
+<p align="center">
+  <img src="../../docs/imgs/densecl.png" width="100%" height="100%"/>
+</p>
+
+
+## Getting Started
+
+### 1. Train DenseCL
+
+#### single gpu
+```
+python tools/train.py -c configs/densecl/densecl_r50.yaml
+```
+
+#### multiple gpus
+
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/densecl/densecl_r50.yaml
+```
+
+Pretraining models with 200 epochs can be found at [densecl](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing)
+
+Note: The default learning rate in config files is for 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following ```new_lr = old_lr * new_ngpus / old_ngpus```.
+
+### 2. Extract backbone weights
+
+```
+python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --remove_prefix
+```
+
+### 3. Evaluation on ImageNet Linear Classification
+
+#### Train:
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --pretrained ${WEIGHT_FILE}
+```
+
+#### Evaluate:
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
+```
+
+The trained linear weights in conjuction with the backbone weights can be found at [densecl linear](https://drive.google.com/file/d/1XJeDY8clKfhUeXw4JcCa1QgG2G-Ibr4m/view?usp=sharing)
+
+## Reference
+
+```
+@inproceedings{wang2021dense,
+  title={Dense contrastive learning for self-supervised visual pre-training},
+  author={Wang, Xinlong and Zhang, Rufeng and Shen, Chunhua and Kong, Tao and Li, Lei},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={3024--3033},
+  year={2021}
+}
+```
@@ -0,0 +1,65 @@
+简体中文 | [English](README.md)
+
+# Dense Contrastive Learning for Self-Supervised Visual Pre-Training ([arxiv](https://arxiv.org/abs/2011.09157))
+
+## 简介
+
+迄今为止，大多数现有的自我监督学习方法都是针对图像分类设计和优化的。由于图像级预测和像素级预测之间的差异，这些预训练模型对于密集预测任务可能不是最佳的。为了填补这一空白，我们的目标是设计一种有效的、密集的自监督学习方法，该方法通过考虑局部特征之间的对应关系，直接在像素（或局部特征）级别上进行对比学习。具体来说，我们提出了密集对比学习，它通过优化输入图像的两个视图之间的像素级别的成对对比（不）相似性损失来实现自我监督学习。与基线方法 MoCo-v2 相比，我们的方法引入的计算开销可以忽略不计（仅慢 <1%），但在转移到下游密集预测任务（包括对象检测、语义分割和实例分割）时表现出了一致的卓越性能，大大优于最先进的方法。具体来说，在强大的 MoCo-v2 基线上，我们的方法在 PASCAL VOC 对象检测上实现了 2.0% AP、COCO 对象检测上 1.1% AP、COCO 实例分割上 0.9% AP、PASCAL VOC 语义分割上 3.0% mIoU 和 3.0% mIoU 以及Cityscapes 语义分割 1.8% mIoU的显著改进。
+
+<p align="center">
+  <img src="../../docs/imgs/densecl.png" width="100%" height="100%"/>
+</p>
+
+
+
+## 快速开始
+
+### 1. 训练DenseCL
+
+单卡训练
+
+```bash
+python tools/train.py -c configs/densecl/densecl_r50.yaml
+```
+
+多卡训练
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/densecl/densecl_r50.yaml
+```
+
+200 个 epoch 的预训练模型权重：[densecl](https://drive.google.com/file/d/1RWPO_g-fNJv8FsmCZ3LUbPTgPwtx-ybZ/view?usp=sharing)
+
+### 2. 提取 backbone 权重
+
+```bash
+python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --remove_prefix
+```
+
+### 3. ImageNet 线性分类评估
+
+训练
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --pretrained ${WEIGHT_FILE}
+```
+
+评估
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/moco/moco_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
+```
+
+主干网络以及线性权重：[densecl linear](https://drive.google.com/file/d/1XJeDY8clKfhUeXw4JcCa1QgG2G-Ibr4m/view?usp=sharing)
+
+### 参考
+
+```
+@inproceedings{wang2021dense,
+  title={Dense contrastive learning for self-supervised visual pre-training},
+  author={Wang, Xinlong and Zhang, Rufeng and Shen, Chunhua and Kong, Tao and Li, Lei},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={3024--3033},
+  year={2021}
+}
+```
@@ -0,0 +1,100 @@
+epochs: 200
+output_dir: output_dir
+seed: 0
+device: gpu
+
+# used for static mode and model export
+image_shape: [3, 224, 224]
+save_inference_dir: ./inference
+
+model:
+  name: DenseCL
+  backbone:
+    name: ResNet
+    depth: 50
+  neck:
+    name: DenseCLNeck
+    in_channels: 2048
+    hid_channels: 2048
+    out_channels: 128
+    num_grid: None
+  head:
+    name: ContrastiveHead
+    temperature: 0.2
+    return_accuracy: False
+
+dataloader:
+  train:
+    loader:
+      num_workers: 8
+      use_shared_memory: True
+    sampler:
+      batch_size: 32
+      shuffle: true
+      drop_last: true
+    dataset:
+      name: ImageNet
+      dataroot: data/ILSVRC2012/train
+      return_label: False
+      return_two_sample: True
+      transforms:
+        - name: RandomResizedCrop
+          size: 224
+          scale: [0.2, 1.]
+      view_trans1:
+        - name: RandomApply
+          transforms:
+          - name: ColorJitter
+            brightness: 0.4
+            contrast: 0.4
+            saturation: 0.4
+            hue: 0.1
+          p: 0.8
+        - name: RandomGrayscale
+          p: 0.2
+        - name: RandomApply
+          transforms:
+          - name: GaussianBlur
+            sigma: [0.1, 2.0]
+          p: 0.5
+        - name: RandomHorizontalFlip
+        - name: Transpose
+        - name: NormalizeImage
+          scale: 1.0/255.0
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+      view_trans2:
+        - name: RandomApply
+          transforms:
+          - name: ColorJitter
+            brightness: 0.4
+            contrast: 0.4
+            saturation: 0.4
+            hue: 0.1
+          p: 0.8
+        - name: RandomGrayscale
+          p: 0.2
+        - name: RandomApply
+          transforms:
+          - name: GaussianBlur
+            sigma: [0.1, 2.0]
+          p: 0.5
+        - name: RandomHorizontalFlip
+        - name: Transpose
+        - name: NormalizeImage
+          scale: 1.0/255.0
+          mean: [0.485, 0.456, 0.406]
+          std: [0.229, 0.224, 0.225]
+
+lr_scheduler:
+  name: CosineAnnealingDecay
+  learning_rate: 0.03
+  T_max: 200
+
+optimizer:
+  name: Momentum
+  weight_decay: 0.0001
+
+log_config:
+    name: LogHook
+    interval: 50
@@ -0,0 +1,62 @@
+[简体中文](README_ch.md) | English
+
+# Exploring Simple Siamese Representation Learning ([arxiv](https://arxiv.org/abs/2011.10566))
+
+## Introduction
+
+Siamese networks have become a common structure in various recent models for unsupervised visual representation learning. These models maximize the similarity between two augmentations of one image, subject to certain conditions for avoiding collapsing solutions. In this paper, we report surprising empirical results that simple Siamese networks can learn meaningful representations even using none of the following: (i) negative sample pairs, (ii) large batches, (iii) momentum encoders. Our experiments show that collapsing solutions do exist for the loss and structure, but a stop-gradient operation plays an essential role in preventing collapsing. We provide a hypothesis on the implication of stop-gradient, and further show proof-of-concept experiments verifying it. Our "SimSiam" method achieves competitive results on ImageNet and downstream tasks. We hope this simple baseline will motivate people to rethink the roles of Siamese architectures for unsupervised representation learning.
+
+<p align="center">
+  <img src="../../docs/imgs/simsiam.png" width="60%" height="60%"/>
+</p>
+
+## Getting Started
+
+### 1. Train SimSiam
+
+#### single gpu
+```
+python tools/train.py -c configs/simsiam/simsiam_r50.yaml
+```
+
+#### multiple gpus
+
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_r50.yaml
+```
+
+Pretraining models with 100 epochs can be found at [simsiam](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing)
+
+Note: The default learning rate in config files is for 8 GPUs. If using differnt number GPUs, the total batch size will change in proportion, you have to scale the learning rate following ```new_lr = old_lr * new_ngpus / old_ngpus```.
+
+### 2. Extract backbone weights
+
+```
+python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --prefix encoder --remove_prefix
+```
+
+### 3. Evaluation on ImageNet Linear Classification
+
+#### Train:
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --pretrained ${WEIGHT_FILE}
+```
+
+#### Evaluate:
+```
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
+```
+
+The trained linear weights in conjuction with the backbone weights can be found at [simsiam linear](https://drive.google.com/file/d/19smHZGhBEPWeyLjKIGhM7KPngr-8BOUl/view?usp=sharing)
+
+## Reference
+
+```
+@inproceedings{chen2021exploring,
+  title={Exploring simple siamese representation learning},
+  author={Chen, Xinlei and He, Kaiming},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={15750--15758},
+  year={2021}
+}
+```
@@ -0,0 +1,64 @@
+简体中文 | [English](README.md)
+
+# Exploring Simple Siamese Representation Learning ([arxiv](https://arxiv.org/abs/2011.10566))
+
+## 简介
+
+孪生网络已成为最近各种无监督视觉表示学习模型中的常见结构。这些模型最大限度地提高了一幅图像的两个增强之间的相似性，受制于避免崩溃解决方案的某些条件。在本文中，我们报告了令人惊讶的经验结果，即简单的孪生网络即使不使用以下任何一种，也可以学习有意义的表示：（i）负样本对；（ii）大批量；（iii）动量编码器。我们的实验表明，对于损失和结构确实存在坍塌解决方案，但停止梯度操作在防止坍塌方面起着至关重要的作用。我们提供了关于停止梯度的含义的假设，并进一步展示了验证它的概念验证实验。我们的SimSiam方法在 ImageNet 和下游任务上取得了有竞争力的结果。我们希望这个简单的基线能够激发人们重新思考孪生架构在无监督表示学习中的作用。
+
+<p align="center">
+  <img src="../../docs/imgs/simsiam.png" width="60%" height="60%"/>
+</p>
+
+
+## 快速开始
+
+### 1. 训练SimSiam
+
+单卡训练
+
+```bash
+python tools/train.py -c configs/simsiam/simsiam_r50.yaml
+```
+
+多卡训练
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_r50.yaml
+```
+
+100 个 epoch 的预训练模型权重： [simsiam](https://drive.google.com/file/d/1kaAm8-tlvB570kzI4fo9h4dwGQFf_4FE/view?usp=sharing)
+
+### 2. 提取 backbone 权重
+
+```bash
+python tools/extract_weight.py ${CHECKPOINT} --output ${WEIGHT_FILE} --prefix encoder --remove_prefix
+```
+
+### 3. ImageNet 线性分类评估
+
+训练
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --pretrained ${WEIGHT_FILE}
+```
+
+评估
+
+```bash
+python -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" tools/train.py -c configs/simsiam/simsiam_clas_r50.yaml --load ${CLS_WEGHT_FILE} --evaluate-only
+```
+
+主干网络以及线性权重：[simsiam linear](https://drive.google.com/file/d/19smHZGhBEPWeyLjKIGhM7KPngr-8BOUl/view?usp=sharing)
+
+### 参考
+
+```
+@inproceedings{chen2021exploring,
+  title={Exploring simple siamese representation learning},
+  author={Chen, Xinlei and He, Kaiming},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={15750--15758},
+  year={2021}
+}
+```