Skip to content

Commit 0078027

Browse files
authored
Merge pull request #1 from AgentMaker/main
Add the PVT model New version 1.0.6 Update codes
2 parents c52d481 + aa6bce5 commit 0078027

File tree

29 files changed

+1241
-448
lines changed

29 files changed

+1241
-448
lines changed

README.md

Lines changed: 65 additions & 159 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ English | [简体中文](README_CN.md)
88

99
A PaddlePaddle version image model zoo.
1010

11+
![](https://ai-studio-static-online.cdn.bcebos.com/34e7bbbc80d24412b3c21efb56778ad43b53f9b1be104e499e0ff8b663a64a53)
12+
1113
## Install Package
1214
* Install by pip:
1315

@@ -22,10 +24,10 @@ A PaddlePaddle version image model zoo.
2224

2325
```python
2426
import paddle
25-
from ppim import rednet26
27+
from ppim import rednet_26
2628
2729
# Load the model
28-
model, val_transforms = rednet26(pretrained=True)
30+
model, val_transforms = rednet_26(pretrained=True)
2931
3032
# Model summary
3133
paddle.summary(model, input_size=(1, 3, 224, 224))
@@ -45,10 +47,10 @@ A PaddlePaddle version image model zoo.
4547
import paddle.vision.transforms as T
4648
from paddle.vision import Cifar100
4749
48-
from ppim import rexnet_100
50+
from ppim import rexnet_1_0
4951
5052
# Load the model
51-
model, val_transforms = rexnet_100(pretrained=True)
53+
model, val_transforms = rexnet_1_0(pretrained=True)
5254
5355
# Use the PaddleHapi Model
5456
model = paddle.Model(model)
@@ -95,161 +97,24 @@ A PaddlePaddle version image model zoo.
9597
```
9698

9799
## Model Zoo
98-
### ReXNet
99-
* Paper:[ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network](https://arxiv.org/abs/2007.00992)
100-
* Origin Repo:[clovaai/rexnet](https://github.com/clovaai/rexnet)
101-
* Evaluate Transforms:
102-
103-
```python
104-
# backend: pil
105-
# input_size: 224x224
106-
transforms = T.Compose([
107-
T.Resize(256, interpolation='bicubic'),
108-
T.CenterCrop(224),
109-
T.ToTensor(),
110-
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
111-
])
112-
```
113100

114-
* Model Details:
101+
* [DLA](./docs/en/model_zoo/dla.md)
115102

116-
| Model | Params(M) | FLOPs(G) | Top-1 (%) | Top-5 (%) |
117-
|:---------------------:|:---------:|:--------:|:---------:|:---------:|
118-
| ReXNet-1.0 | 4.8 | 0.40 | 77.9 | 93.9 |
119-
| ReXNet-1.3 | 7.6 | 0.66 | 79.5 | 94.7 |
120-
| ReXNet-1.5 | 7.6 | 0.66 | 80.3 | 95.2 |
121-
| ReXNet-2.0 | 16 | 1.5 | 81.6 | 95.7 |
122-
| ReXNet-3.0 | 34 | 3.4 | 82.8 | 96.2 |
103+
* [ReXNet](./docs/en/model_zoo/rexnet.md)
123104

124-
### RedNet
125-
* Paper:[Involution: Inverting the Inherence of Convolution for Visual Recognition](https://arxiv.org/abs/2103.06255)
126-
* Origin Repo:[d-li14/involution](https://github.com/d-li14/involution)
127-
* Evaluate Transforms:
105+
* [RedNet](./docs/en/model_zoo/rednet.md)
128106

129-
```python
130-
# backend: cv2
131-
# input_size: 224x224
132-
transforms = T.Compose([
133-
T.Resize(256),
134-
T.CenterCrop(224),
135-
T.Normalize(
136-
mean=[123.675, 116.28, 103.53],
137-
std=[58.395, 57.12, 57.375],
138-
to_rgb=True,
139-
data_format='HWC'
140-
),
141-
T.ToTensor(),
142-
])
143-
```
107+
* [RepVGG](./docs/en/model_zoo/repvgg.md)
144108

145-
* Model Details:
109+
* [HarDNet](./docs/en/model_zoo/hardnet.md)
146110

147-
| Model | Params(M) | FLOPs(G) | Top-1 (%) | Top-5 (%) |
148-
|:---------------------:|:---------:|:--------:|:---------:|:---------:|
149-
| RedNet-26 | 9.23 | 1.73 | 75.96 | 93.19 |
150-
| RedNet-38 | 12.39 | 2.22 | 77.48 | 93.57 |
151-
| RedNet-50 | 15.54 | 2.71 | 78.35 | 94.13 |
152-
| RedNet-101 | 25.65 | 4.74 | 78.92 | 94.35 |
153-
| RedNet-152 | 33.99 | 6.79 | 79.12 | 94.38 |
154-
155-
### RepVGG
156-
* Paper:[RepVGG: Making VGG-style ConvNets Great Again](https://arxiv.org/abs/2101.03697)
157-
* Origin Repo:[DingXiaoH/RepVGG](https://github.com/DingXiaoH/RepVGG)
158-
* Evaluate Transforms:
159-
160-
```python
161-
# backend: pil
162-
# input_size: 224x224
163-
transforms = T.Compose([
164-
T.Resize(256),
165-
T.CenterCrop(224),
166-
T.ToTensor(),
167-
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
168-
])
169-
```
170-
171-
* Model Details:
172-
173-
| Model | Params(M) | FLOPs(G) | Top-1 (%) | Top-5 (%) |
174-
|:---------------------:|:---------:|:--------:|:---------:|:---------:|
175-
| RepVGG-A0 | 8.30 | 1.4 | 72.41 | |
176-
| RepVGG-A1 | 12.78 | 2.4 | 74.46 | |
177-
| RepVGG-A2 | 25.49 | 5.1 | 76.48 | |
178-
| RepVGG-B0 | 14.33 | 3.1 | 75.14 | |
179-
| RepVGG-B1 | 51.82 | 11.8 | 78.37 | |
180-
| RepVGG-B2 | 80.31 | 18.4 | 78.78 | |
181-
| RepVGG-B3 | 110.96 | 26.2 | 80.52 | |
182-
| RepVGG-B1g2 | 41.36 | 8.8 | 77.78 | |
183-
| RepVGG-B1g4 | 36.12 | 7.3 | 77.58 | |
184-
| RepVGG-B2g4 | 55.77 | 11.3 | 79.38 | |
185-
| RepVGG-B3g4 | 75.62 | 16.1 | 80.21 | |
186-
187-
### PiT
188-
* Paper:[Rethinking Spatial Dimensions of Vision Transformers](https://arxiv.org/abs/2103.16302)
189-
* Origin Repo:[naver-ai/pit](https://github.com/naver-ai/pit)
190-
* Evaluate Transforms:
191-
192-
```python
193-
# backend: pil
194-
# input_size: 224x224
195-
transforms = T.Compose([
196-
T.Resize(248, interpolation='bicubic'),
197-
T.CenterCrop(224),
198-
T.ToTensor(),
199-
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
200-
])
201-
```
202-
203-
* Model Details:
204-
205-
| Model | Params(M) | FLOPs(G) | Top-1 (%) | Top-5 (%) |
206-
|:---------------------:|:---------:|:--------:|:---------:|:---------:|
207-
| PiT-Ti | 4.9 | 0.71 | 73.0 | |
208-
| PiT-XS | 10.6 | 1.4 | 78.1 | |
209-
| PiT-S | 23.5 | 2.9 | 80.9 | |
210-
| PiT-B | 73.8 | 12.5 | 82.0 | |
211-
| PiT-Ti distilled | 4.9 | 0.71 | 74.6 | |
212-
| PiT-XS distilled | 10.6 | 1.4 | 79.1 | |
213-
| PiT-S distilled | 23.5 | 2.9 | 81.9 | |
214-
| PiT-B distilled | 73.8 | 12.5 | 84.0 | |
215-
216-
### DeiT
217-
* Paper:[Training data-efficient image transformers & distillation through attention](https://arxiv.org/abs/2012.12877)
218-
* Origin Repo:[facebookresearch/deit](https://github.com/facebookresearch/deit)
219-
* Evaluate Transforms:
220-
221-
```python
222-
# backend: pil
223-
# input_size: 224x224
224-
transforms = T.Compose([
225-
T.Resize(248, interpolation='bicubic'),
226-
T.CenterCrop(224),
227-
T.ToTensor(),
228-
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
229-
])
111+
* [PiT](./docs/en/model_zoo/pit.md)
230112

231-
# backend: pil
232-
# input_size: 384x384
233-
transforms = T.Compose([
234-
T.Resize(384, interpolation='bicubic'),
235-
T.CenterCrop(384),
236-
T.ToTensor(),
237-
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
238-
])
239-
```
113+
* [PVT](./docs/en/model_zoo/pvt.md)
240114

241-
* Model Details:
115+
* [TNT](./docs/en/model_zoo/tnt.md)
242116

243-
| Model | Params(M) | FLOPs(G) | Top-1 (%) | Top-5 (%) |
244-
|:---------------------:|:---------:|:--------:|:---------:|:---------:|
245-
| DeiT-tiny | 5 | | 72.2 | 91.1 |
246-
| DeiT-small | 22 | | 79.9 | 95.0 |
247-
| DeiT-base | 86 | | 81.8 | 95.6 |
248-
| DeiT-tiny distilled | 6 | | 74.5 | 91.9 |
249-
| DeiT-small distilled | 22 | | 81.2 | 95.4 |
250-
| DeiT-base distilled | 87 | | 83.4 | 96.5 |
251-
| DeiT-base 384 | 87 | | 82.9 | 96.2 |
252-
| DeiT-base distilled 384 | 88 | | 85.2 | 97.2 |
117+
* [DeiT](./docs/en/model_zoo/deit.md)
253118

254119
## Citation
255120
```
@@ -259,37 +124,78 @@ A PaddlePaddle version image model zoo.
259124
journal = {arXiv preprint arXiv:2007.00992},
260125
year = {2020},
261126
}
262-
```
263-
```
127+
264128
@InProceedings{Li_2021_CVPR,
265129
title = {Involution: Inverting the Inherence of Convolution for Visual Recognition},
266130
author = {Li, Duo and Hu, Jie and Wang, Changhu and Li, Xiangtai and She, Qi and Zhu, Lei and Zhang, Tong and Chen, Qifeng},
267131
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
268132
month = {June},
269133
year = {2021}
270134
}
271-
```
272-
```
135+
273136
@article{ding2021repvgg,
274137
title={RepVGG: Making VGG-style ConvNets Great Again},
275138
author={Ding, Xiaohan and Zhang, Xiangyu and Ma, Ningning and Han, Jungong and Ding, Guiguang and Sun, Jian},
276139
journal={arXiv preprint arXiv:2101.03697},
277140
year={2021}
278141
}
279-
```
280-
```
142+
281143
@article{heo2021pit,
282144
title={Rethinking Spatial Dimensions of Vision Transformers},
283145
author={Byeongho Heo and Sangdoo Yun and Dongyoon Han and Sanghyuk Chun and Junsuk Choe and Seong Joon Oh},
284146
journal={arXiv: 2103.16302},
285147
year={2021},
286148
}
287-
```
288-
```
149+
289150
@article{touvron2020deit,
290151
title = {Training data-efficient image transformers & distillation through attention},
291152
author = {Hugo Touvron and Matthieu Cord and Matthijs Douze and Francisco Massa and Alexandre Sablayrolles and Herv'e J'egou},
292153
journal = {arXiv preprint arXiv:2012.12877},
293154
year = {2020}
294155
}
295-
```
156+
157+
@misc{han2021transformer,
158+
title={Transformer in Transformer},
159+
author={Kai Han and An Xiao and Enhua Wu and Jianyuan Guo and Chunjing Xu and Yunhe Wang},
160+
year={2021},
161+
eprint={2103.00112},
162+
archivePrefix={arXiv},
163+
primaryClass={cs.CV}
164+
}
165+
166+
@misc{chao2019hardnet,
167+
title={HarDNet: A Low Memory Traffic Network},
168+
author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
169+
year={2019},
170+
eprint={1909.00948},
171+
archivePrefix={arXiv},
172+
primaryClass={cs.CV}
173+
}
174+
175+
@misc{yu2019deep,
176+
title={Deep Layer Aggregation},
177+
author={Fisher Yu and Dequan Wang and Evan Shelhamer and Trevor Darrell},
178+
year={2019},
179+
eprint={1707.06484},
180+
archivePrefix={arXiv},
181+
primaryClass={cs.CV}
182+
}
183+
184+
@misc{dosovitskiy2020image,
185+
title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
186+
author={Alexey Dosovitskiy and Lucas Beyer and Alexander Kolesnikov and Dirk Weissenborn and Xiaohua Zhai and Thomas Unterthiner and Mostafa Dehghani and Matthias Minderer and Georg Heigold and Sylvain Gelly and Jakob Uszkoreit and Neil Houlsby},
187+
year={2020},
188+
eprint={2010.11929},
189+
archivePrefix={arXiv},
190+
primaryClass={cs.CV}
191+
}
192+
193+
@misc{wang2021pyramid,
194+
title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
195+
author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
196+
year={2021},
197+
eprint={2102.12122},
198+
archivePrefix={arXiv},
199+
primaryClass={cs.CV}
200+
}
201+
```

0 commit comments

Comments
 (0)