Atten4Vis
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 77 additions & 0 deletions b/‎README.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎config.py‎
Lines changed: 252 additions & 0 deletions b/‎config.py‎
Lines changed: 252 additions & 0 deletions
diff --git a/‎configs/dwnet_base_patch4_window7_224.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/dwnet_base_patch4_window7_224.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎configs/dwnet_tiny_patch4_window7_224.yaml‎
Lines changed: 8 additions & 0 deletions b/‎configs/dwnet_tiny_patch4_window7_224.yaml‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎configs/dynamic_dwnet_base_patch4_window7_224.yaml‎
Lines changed: 9 additions & 0 deletions b/‎configs/dynamic_dwnet_base_patch4_window7_224.yaml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎configs/dynamic_dwnet_tiny_patch4_window7_224.yaml‎
Lines changed: 9 additions & 0 deletions b/‎configs/dynamic_dwnet_tiny_patch4_window7_224.yaml‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎data/__init__.py‎
Lines changed: 1 addition & 0 deletions b/‎data/__init__.py‎
Lines changed: 1 addition & 0 deletions
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2021 Qi Han
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,77 @@
+# Demysitifing Local Vision Transformer, [arxiv](https://arxiv.org/pdf/2106.04263.pdf)
+
+This is the official PyTorch implementation of our paper. We simply replace local self attention by (dynamic) depth-wise convolution with lower computational cost. The performance is on par with the Swin Transformer.
+
+Besides, the main contribution of our paper is the theorical and detailed comparison between depth-wise convolution and local self attention from three aspects: sparse connectivity, weight sharing and dynamic weight. By this paper, we want community to rethinking the local self attention and depth-wise convolution, and the basic model architeture designing rules.
+
+<p align="center">
+  <img width="600" height="300" src="figures/relation.png">
+</p>
+
+Codes and models for object detection and semantic segmentation are avaliable in 'downstreams'.
+
+We also give MLP based Swin Transformer models and Inhomogenous dynamic convolution in the ablation studies. These codes and models will coming soon.
+
+
+## Reference
+```
+@article{han2021demystifying,
+  title={Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight},
+  author={Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong},
+  journal={arXiv preprint arXiv:2106.04263},
+  year={2021}
+}
+```
+## 1. Requirements
+torch>=1.5.0, torchvision, [timm](https://github.com/rwightman/pytorch-image-models), pyyaml; apex-amp
+
+data perpare: ImageNet dataset with the following structure:
+```
+│imagenet/
+├──train/
+│  ├── n01440764
+│  │   ├── n01440764_10026.JPEG
+│  │   ├── n01440764_10027.JPEG
+│  │   ├── ......
+│  ├── ......
+├──val/
+│  ├── n01440764
+│  │   ├── ILSVRC2012_val_00000293.JPEG
+│  │   ├── ILSVRC2012_val_00002138.JPEG
+│  │   ├── ......
+│  ├── ......
+```
+
+## 2. Trainning
+For tiny model, we train with batch-size 128 on 8 GPUs. When trainning base model, we use batch-size 64 on 16 GPUs with OpenMPI to keep the total batch-size unchanged. (With the same trainning setting, the base model couldn't train with AMP due to the anomalous gradient values.)
+
+Please change the data path in sh scripts first.
+
+For tiny model:
+```bash
+bash scripts/run_dwnet_tiny_patch4_window7_224.sh 
+bash scripts/run_dynamic_dwnet_tiny_patch4_window7_224.sh
+```
+For base model, use multi node with OpenMPI:
+```bash
+bash scripts/run_dwnet_base_patch4_window7_224.sh 
+bash scripts/run_dynamic_dwnet_base_patch4_window7_224.sh
+```
+
+## 3. Evaluation
+```
+python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/change_to_config_file --resume /path/to/model --data-path /path/to/imagenet --eval
+```
+
+## 4. Models
+Models are provided by training on ImageNet with resolution 224.
+
+| Model |  #params | FLOPs | Top1 Acc| Download | 
+| :---     |   :---:    |  :---: |  :---:  |  :---:  |
+dwnet_tiny | 24M   | 3.8G  |  81.2   |  [github](https://github.com/Atten4Vis/DemystifyLocalViT/releases/download/prerelease/dwnet_tiny_224.pth) |
+dynamic_dwnet_tiny | 51M   | 3.8G  |  81.8   |  [github](https://github.com/Atten4Vis/DemystifyLocalViT/releases/download/prerelease/dynamic_dwnet_tiny_224.pth) |
+dwnet_base | 74M   | 12.9G  |  83.2   |  [github](https://github.com/Atten4Vis/DemystifyLocalViT/releases/download/prerelease/dwnet_base_224.pth) |
+dynamic_dwnet_base | 162M   | 13.0G  |  83.2   |  [github](https://github.com/Atten4Vis/DemystifyLocalViT/releases/download/prerelease/dynamic_dwnet_base_224.pth) |
+
+## LICENSE
+This repo is under the MIT license. Some codes are borrow from [Swin Transformer](https://github.com/microsoft/Swin-Transformer).
@@ -0,0 +1,252 @@
+import os
+import yaml
+from yacs.config import CfgNode as CN
+
+_C = CN()
+
+# Base config files
+_C.BASE = ['']
+
+# -----------------------------------------------------------------------------
+# Data settings
+# -----------------------------------------------------------------------------
+_C.DATA = CN()
+# Batch size for a single GPU, could be overwritten by command line argument
+_C.DATA.BATCH_SIZE = 128
+# Path to dataset, could be overwritten by command line argument
+_C.DATA.DATA_PATH = ''
+# Dataset name
+_C.DATA.DATASET = 'imagenet'
+# Input image size
+_C.DATA.IMG_SIZE = 224
+# Interpolation to resize image (random, bilinear, bicubic)
+_C.DATA.INTERPOLATION = 'bicubic'
+# Use zipped dataset instead of folder dataset
+# could be overwritten by command line argument
+_C.DATA.ZIP_MODE = False
+# Cache Data in Memory, could be overwritten by command line argument
+_C.DATA.CACHE_MODE = 'part'
+# Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
+_C.DATA.PIN_MEMORY = True
+# Number of data loading threads
+_C.DATA.NUM_WORKERS = 8
+
+# -----------------------------------------------------------------------------
+# Model settings
+# -----------------------------------------------------------------------------
+_C.MODEL = CN()
+# Model type
+_C.MODEL.TYPE = 'swin'
+# Model name
+_C.MODEL.NAME = 'swin_tiny_patch4_window7_224'
+# Checkpoint to resume, could be overwritten by command line argument
+_C.MODEL.RESUME = ''
+# Number of classes, overwritten in data preparation
+_C.MODEL.NUM_CLASSES = 1000
+# Dropout rate
+_C.MODEL.DROP_RATE = 0.0
+# Drop path rate
+_C.MODEL.DROP_PATH_RATE = 0.1
+# Label Smoothing
+_C.MODEL.LABEL_SMOOTHING = 0.1
+
+# Swin Transformer parameters
+_C.MODEL.DWNET = CN()
+_C.MODEL.DWNET.PATCH_SIZE = 4
+_C.MODEL.DWNET.IN_CHANS = 3
+_C.MODEL.DWNET.EMBED_DIM = 96
+_C.MODEL.DWNET.DEPTHS = [2, 2, 6, 2]
+_C.MODEL.DWNET.WINDOW_SIZE = 7
+_C.MODEL.DWNET.MLP_RATIO = 4.
+_C.MODEL.DWNET.APE = False
+_C.MODEL.DWNET.PATCH_NORM = True
+_C.MODEL.DWNET.CONV_TYPE = "v1"
+_C.MODEL.DWNET.DYNAMIC = False
+
+# halo Transformer parameters
+_C.MODEL.HALO = CN()
+_C.MODEL.HALO.PATCH_SIZE = 4
+_C.MODEL.HALO.IN_CHANS = 3
+_C.MODEL.HALO.EMBED_DIM = 96
+_C.MODEL.HALO.DEPTHS = [2, 2, 6, 2]
+_C.MODEL.HALO.NUM_HEADS = [3, 6, 12, 24]
+_C.MODEL.HALO.WINDOW_SIZE = [7, 7, 7, 7]
+_C.MODEL.HALO.HALO_SIZE = [3, 3, 3, 3]
+_C.MODEL.HALO.MLP_RATIO = 4.
+_C.MODEL.HALO.QKV_BIAS = True
+_C.MODEL.HALO.QK_SCALE = None
+_C.MODEL.HALO.APE = False
+_C.MODEL.HALO.PATCH_NORM = True
+
+
+# -----------------------------------------------------------------------------
+# Training settings
+# -----------------------------------------------------------------------------
+_C.TRAIN = CN()
+_C.TRAIN.START_EPOCH = 0
+_C.TRAIN.EPOCHS = 300
+_C.TRAIN.WARMUP_EPOCHS = 20
+_C.TRAIN.WEIGHT_DECAY = 0.05
+_C.TRAIN.BASE_LR = 5e-4
+_C.TRAIN.WARMUP_LR = 5e-7
+_C.TRAIN.MIN_LR = 5e-6
+# Clip gradient norm
+_C.TRAIN.CLIP_GRAD = 5.0
+# Auto resume from latest checkpoint
+_C.TRAIN.AUTO_RESUME = False
+# Gradient accumulation steps
+# could be overwritten by command line argument
+_C.TRAIN.ACCUMULATION_STEPS = 0
+# Whether to use gradient checkpointing to save memory
+# could be overwritten by command line argument
+_C.TRAIN.USE_CHECKPOINT = False
+
+# LR scheduler
+_C.TRAIN.LR_SCHEDULER = CN()
+_C.TRAIN.LR_SCHEDULER.NAME = 'cosine'
+# Epoch interval to decay LR, used in StepLRScheduler
+_C.TRAIN.LR_SCHEDULER.DECAY_EPOCHS = 30
+# LR decay rate, used in StepLRScheduler
+_C.TRAIN.LR_SCHEDULER.DECAY_RATE = 0.1
+
+# Optimizer
+_C.TRAIN.OPTIMIZER = CN()
+_C.TRAIN.OPTIMIZER.NAME = 'adamw'
+# Optimizer Epsilon
+_C.TRAIN.OPTIMIZER.EPS = 1e-8
+# Optimizer Betas
+_C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.999)
+# SGD momentum
+_C.TRAIN.OPTIMIZER.MOMENTUM = 0.9
+
+# -----------------------------------------------------------------------------
+# Augmentation settings
+# -----------------------------------------------------------------------------
+_C.AUG = CN()
+# Color jitter factor
+_C.AUG.COLOR_JITTER = 0.4
+# Use AutoAugment policy. "v0" or "original"
+_C.AUG.AUTO_AUGMENT = 'rand-m9-mstd0.5-inc1'
+# Random erase prob
+_C.AUG.REPROB = 0.25
+# Random erase mode
+_C.AUG.REMODE = 'pixel'
+# Random erase count
+_C.AUG.RECOUNT = 1
+# Mixup alpha, mixup enabled if > 0
+_C.AUG.MIXUP = 0.8
+# Cutmix alpha, cutmix enabled if > 0
+_C.AUG.CUTMIX = 1.0
+# Cutmix min/max ratio, overrides alpha and enables cutmix if set
+_C.AUG.CUTMIX_MINMAX = None
+# Probability of performing mixup or cutmix when either/both is enabled
+_C.AUG.MIXUP_PROB = 1.0
+# Probability of switching to cutmix when both mixup and cutmix enabled
+_C.AUG.MIXUP_SWITCH_PROB = 0.5
+# How to apply mixup/cutmix params. Per "batch", "pair", or "elem"
+_C.AUG.MIXUP_MODE = 'batch'
+
+# -----------------------------------------------------------------------------
+# Testing settings
+# -----------------------------------------------------------------------------
+_C.TEST = CN()
+# Whether to use center crop when testing
+_C.TEST.CROP = True
+
+# -----------------------------------------------------------------------------
+# Misc
+# -----------------------------------------------------------------------------
+# Mixed precision opt level, if O0, no amp is used ('O0', 'O1', 'O2')
+# overwritten by command line argument
+_C.AMP_OPT_LEVEL = ''
+# Path to output folder, overwritten by command line argument
+_C.OUTPUT = ''
+# Tag of experiment, overwritten by command line argument
+_C.TAG = 'default'
+# Frequency to save checkpoint
+_C.SAVE_FREQ = 1
+# Frequency to logging info
+_C.PRINT_FREQ = 10
+# Fixed random seed
+_C.SEED = 0
+# Perform evaluation only, overwritten by command line argument
+_C.EVAL_MODE = False
+# Test throughput only, overwritten by command line argument
+_C.THROUGHPUT_MODE = False
+# local rank for DistributedDataParallel, given by command line argument
+_C.LOCAL_RANK = 0
+
+
+def _update_config_from_file(config, cfg_file):
+    config.defrost()
+    with open(cfg_file, 'r') as f:
+        yaml_cfg = yaml.load(f, Loader=yaml.FullLoader)
+
+    for cfg in yaml_cfg.setdefault('BASE', ['']):
+        if cfg:
+            _update_config_from_file(
+                config, os.path.join(os.path.dirname(cfg_file), cfg)
+            )
+    print('=> merge config from {}'.format(cfg_file))
+    config.merge_from_file(cfg_file)
+    config.freeze()
+
+
+def update_config(config, args):
+    _update_config_from_file(config, args.cfg)
+
+    config.defrost()
+    if args.opts:
+        config.merge_from_list(args.opts)
+
+    # merge from specific arguments
+    if args.batch_size:
+        config.DATA.BATCH_SIZE = args.batch_size
+    if args.data_path:
+        config.DATA.DATA_PATH = args.data_path
+    if args.zip:
+        config.DATA.ZIP_MODE = True
+    if args.cache_mode:
+        config.DATA.CACHE_MODE = args.cache_mode
+    if args.resume:
+        config.MODEL.RESUME = args.resume
+    if args.accumulation_steps:
+        config.TRAIN.ACCUMULATION_STEPS = args.accumulation_steps
+    if args.use_checkpoint:
+        config.TRAIN.USE_CHECKPOINT = True
+    if args.amp_opt_level:
+        config.AMP_OPT_LEVEL = args.amp_opt_level
+    if args.output:
+        config.OUTPUT = args.output
+    if args.tag:
+        config.TAG = args.tag
+    if args.eval:
+        config.EVAL_MODE = True
+    if args.throughput:
+        config.THROUGHPUT_MODE = True
+    if args.data_set== 'CIFAR':
+        config.DATA.DATASET='cifar'
+    elif args.data_set == 'IMNET':
+        config.DATA.DATASET='imagenet'
+    if args.epoch!=300:
+        config.TRAIN.EPOCHS=args.epoch
+        
+
+    # set local rank for distributed training
+    config.LOCAL_RANK = args.local_rank
+
+    # output folder
+    config.OUTPUT = os.path.join(config.OUTPUT, config.MODEL.NAME, config.TAG)
+
+    config.freeze()
+
+
+def get_config(args):
+    """Get a yacs CfgNode object with default values."""
+    # Return a clone so that the defaults will not be altered
+    # This is for the "local variable" use pattern
+
+    config = _C.clone()
+    update_config(config, args)
+
+    return config
@@ -0,0 +1,8 @@
+MODEL:
+  TYPE: dwnet
+  NAME: dwnet_base_patch4_window7_224
+  DROP_PATH_RATE: 0.5
+  DWNET:
+    EMBED_DIM: 128
+    DEPTHS: [ 2, 2, 18, 2 ]
+    WINDOW_SIZE: 7
@@ -0,0 +1,8 @@
+MODEL:
+  TYPE: dwnet
+  NAME: dwnet_tiny_patch4_window7_224
+  DROP_PATH_RATE: 0.2
+  DWNET:
+    EMBED_DIM: 96
+    DEPTHS: [ 2, 2, 6, 2 ]
+    WINDOW_SIZE: 7
@@ -0,0 +1,9 @@
+MODEL:
+  TYPE: ddwnet
+  NAME: ddwnet_base_patch4_window7_224
+  DROP_PATH_RATE: 0.5
+  DWNET:
+    EMBED_DIM: 128
+    DEPTHS: [ 2, 2, 18, 2 ]
+    WINDOW_SIZE: 7
+    DYNAMIC: True
@@ -0,0 +1,9 @@
+MODEL:
+  TYPE: ddwnet
+  NAME: ddwnet_tiny_patch4_window7_224
+  DROP_PATH_RATE: 0.2
+  DWNET:
+    EMBED_DIM: 96
+    DEPTHS: [ 2, 2, 6, 2 ]
+    WINDOW_SIZE: 7
+    DYNAMIC: True
@@ -0,0 +1 @@
+from .build import build_loader