Skip to content

Commit 8baba51

Browse files
authored
update rec models Readme and fix master_resnet bug (#794)
1 parent d1c0db0 commit 8baba51

File tree

20 files changed

+652
-665
lines changed

20 files changed

+652
-665
lines changed

configs/rec/abinet/README.md

Lines changed: 51 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
> [Read Like Humans: Autonomous, Bidirectional and Iterative Language
66
Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495)
77

8-
## 1. Abstract
8+
## Abstract
99
<!--- Guideline: Introduce the model and architectures. Cite if you use/adopt paper explanation from others. -->
1010
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition. [<a href="#references">1</a>]
1111

@@ -18,51 +18,19 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how
1818
<em> Figure 1. Architecture of ABINet [<a href="#references">1</a>] </em>
1919
</p>
2020

21-
## 2. Results
22-
<!--- Guideline:
23-
Table Format:
24-
- Model: model name in lower case with _ seperator.
25-
- Top-1 and Top-5: Keep 2 digits after the decimal point.
26-
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
27-
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
28-
- Download: url of the pretrained model weights. Use absolute url path.
29-
-->
30-
31-
### Accuracy
32-
33-
According to our experiments, the evaluation results on public benchmark datasets ( IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
21+
## Requirements
3422

35-
<details>
36-
<summary>Performance tested on ascend 910 with graph mode</summary>
23+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
24+
|:----------:|:--------------:|:-------------:|:-------------------:|
25+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
3726

38-
<div align="center">
27+
## Quick Start
28+
### Preparation
3929

40-
| **Model** | **Device** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** |
41-
| :-----: |:----------:| :--------------: | :----------: | :--------: | :--------: |:----------: |
42-
| ABINet | 8p | 91.35% | 14,867 s/epoch | 628.11 | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt)
43-
</div>
44-
45-
Detailed accuracy results for each benchmark dataset
46-
<div align="center">
47-
48-
| **Model** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
49-
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
50-
| ABINet | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36%| 87.33% | 89.58% | 91.35% |
51-
</div>
52-
</details>
53-
54-
55-
**Notes:**
56-
- The input Shapes of MindIR of ABINet is (1, 3, 32, 128).
57-
58-
59-
## 3. Quick Start
60-
### 3.1 Preparation
61-
62-
#### 3.1.1 Installation
30+
#### Installation
6331
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
6432

65-
#### 3.1.2 Dataset Download
33+
#### Dataset Download
6634
Please download LMDB dataset for traininig and evaluation from
6735
- `training` contains two datasets: [MJSynth (MJ)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) and [SynthText (ST)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
6836
- `evaluation` contains several benchmarking datasets, which are [IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](http://rrc.cvc.uab.es/?ch=4), [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf), and [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html).
@@ -99,7 +67,7 @@ data_lmdb_release/
9967
│ └── lock.mdb
10068
```
10169

102-
#### 3.1.3 Dataset Usage
70+
#### Dataset Usage
10371

10472
Here we used the datasets under `train/` folders for **train**. After training, we used the datasets under `evaluation/` to evluation model accuracy.
10573

@@ -200,7 +168,7 @@ data_lmdb_release/
200168
then you can evaluate on each dataset by modifying the config yaml as follows, and execute the script `tools/benchmarking/multi_dataset_eval.py`.
201169

202170

203-
#### 3.1.4 Check YAML Config Files
171+
#### Check YAML Config Files
204172
Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
205173
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args:
206174

@@ -244,7 +212,7 @@ eval:
244212
- Dataset: The MJSynth and SynthText datasets come from [ABINet_repo](https://github.com/FangShancheng/ABINet).
245213

246214

247-
### 3.2 Model Training
215+
### Model Training
248216
<!--- Guideline: Avoid using shell script in the command line. Python script preferred. -->
249217

250218
* Distributed Training
@@ -256,7 +224,7 @@ It is easy to reproduce the reported results with the pre-defined training recip
256224
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
257225
```
258226
The pre-trained model needs to be loaded during ABINet model training, and the weight of the pre-trained model is
259-
from https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt. It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml".
227+
from [abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt). It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml".
260228

261229
* Standalone Training
262230

@@ -269,14 +237,51 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
269237

270238
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`.
271239

272-
### 3.3 Model Evaluation
240+
### Model Evaluation
273241

274242
To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run:
275243

276244
```
277245
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
278246
```
279247

248+
249+
## Results
250+
<!--- Guideline:
251+
Table Format:
252+
- Model: model name in lower case with _ seperator.
253+
- Top-1 and Top-5: Keep 2 digits after the decimal point.
254+
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
255+
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
256+
- Download: url of the pretrained model weights. Use absolute url path.
257+
-->
258+
259+
### Accuracy
260+
261+
According to our experiments, the evaluation results on public benchmark datasets ( IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
262+
263+
<summary>Performance tested on ascend 910* with graph mode</summary>
264+
265+
<div align="center">
266+
267+
| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
268+
|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
269+
| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) |
270+
</div>
271+
272+
Detailed accuracy results for each benchmark dataset
273+
<div align="center">
274+
275+
| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
276+
|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
277+
| ABINet | Resnet45 | 1 | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36% | 87.33% | 89.58% | 91.35% |
278+
</div>
279+
280+
281+
**Notes:**
282+
- The input Shapes of MindIR of ABINet is (1, 3, 32, 128).
283+
284+
280285
## References
281286
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
282287

configs/rec/abinet/README_CN.md

Lines changed: 52 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
> [Read Like Humans: Autonomous, Bidirectional and Iterative Language
66
Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495)
77

8-
## 1. 模型描述
8+
## 模型描述
99
<!--- Guideline: Introduce the model and architectures. Cite if you use/adopt paper explanation from others. -->
1010
语义知识对场景文本识别有很大的帮助。然而,如何在端到端深度网络中有效地建模语义规则仍然是一个研究挑战。在本文中,我们认为语言模型的能力有限来自于:1)隐式语言建模;2)单向特征表示;3)带噪声输入的语言模型。相应地,我们提出了一种自主、双向、迭代的场景文本识别ABINet。首先,自主阻塞视觉和语言模型之间的梯度流,以强制显式语言建模。其次,提出了一种基于双向特征表示的新型双向完形填空式网络作为语言模型。第三,提出了一种语言模型迭代修正的执行方式,可以有效缓解噪声输入的影响。此外,我们提出了一种基于迭代预测集合的自训练方法,可以有效地从未标记的图像中学习。大量的实验表明,ABINet在低质量图像上具有优势,并在几个主流基准上取得了最先进的结果。此外,集成自训练训练的ABINet在实现人类水平的识别方面也有很大的进步 [<a href="#references">1</a>]
1111

@@ -18,48 +18,21 @@ Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495)
1818
<em> 图1. ABINet结构图 [<a href="#references">1</a>] </em>
1919
</p>
2020

21-
## 2. 评估结果
22-
<!--- Guideline:
23-
Table Format:
24-
- Model: model name in lower case with _ seperator.
25-
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
26-
- Top-1 and Top-5: Keep 2 digits after the decimal point.
27-
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
28-
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
29-
- Download: url of the pretrained model weights. Use absolute url path.
30-
-->
21+
## 配套版本
3122

32-
### 精确度
33-
根据我们的实验,在公共基准数据集(IC13、IC15、IIIT、SVT、SVTP、CUTE)上的评估结果如下:
23+
| mindspore | ascend driver | firmware | cann toolkit/kernel |
24+
|:----------:|:--------------:|:-------------:|:-------------------:|
25+
| 2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
3426

35-
<div align="center">
3627

37-
| **Model** | **Context** | **Avg Accuracy** | **Train T.** | **FPS** | **Recipe** | **Download** |
38-
| :-----: | :-----------: | :--------------: | :----------: | :--------: | :--------: |:----------: |
39-
| ABINet | D910x8-MS2.1-G | 91.35% | 14,867 s/epoch | 628.11 | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt)
40-
</div>
28+
## 快速开始
29+
### 环境及数据准备
4130

42-
<details open>
43-
<div align="center">
44-
<summary>每个基准数据集的详细精度结果</summary>
45-
46-
| **Model** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
47-
| :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: | :------: |
48-
| ABINet | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36%| 87.33% | 89.58% | 91.35% |
49-
</div>
50-
</details>
51-
52-
53-
54-
55-
## 3. 快速开始
56-
### 3.1 环境及数据准备
57-
58-
#### 3.1.1 安装
31+
#### 安装
5932
环境安装教程请参考MindOCR的 [installation instruction](https://github.com/mindspore-lab/mindocr#installation).
6033

6134

62-
#### 3.1.2 Dataset Download
35+
#### Dataset Download
6336
请下载LMDB数据集用于训练和评估
6437
- `training` 包含两个数据集: [MJSynth (MJ)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)[SynthText (ST)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
6538
- `evaluation` 包含几个基准数据集,它们是[IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](http://rrc.cvc.uab.es/?ch=4), [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf), 和 [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html).
@@ -96,7 +69,7 @@ data_lmdb_release/
9669
│ └── lock.mdb
9770
```
9871

99-
#### 3.1.3 数据集使用
72+
#### 数据集使用
10073

10174
在这里,我们使用 `train/` 文件夹下的数据集进行训练,我们使用 `evaluation/` 下的数据集来评估模型的准确性。
10275

@@ -213,7 +186,7 @@ eval:
213186
# label_file: # 验证或评估数据集的标签文件路径,将与`dataset_root`拼接形成完整的验证或评估数据的标签文件路径。当数据集为LMDB格式时无需配置
214187
...
215188
```
216-
#### 3.1.4 检查配置文件
189+
#### 检查配置文件
217190
除了数据集的设置,请同时重点关注以下变量的配置:`system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
218191
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`。说明如下:
219192

@@ -257,7 +230,7 @@ eval:
257230
- 数据集:MJSynth和SynthText数据集来自作者公布的代码仓[ABINet_repo](https://github.com/FangShancheng/ABINet).
258231

259232

260-
### 3.2 模型训练
233+
### 模型训练
261234
<!--- Guideline: Avoid using shell script in the command line. Python script preferred. -->
262235

263236
* 分布式训练
@@ -268,7 +241,8 @@ eval:
268241
# 在多个 Ascend 设备上进行分布式训练
269242
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
270243
```
271-
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自<https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt>,需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
244+
ABINet模型训练时需要加载预训练模型,预训练模型的权重来自[abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt),需要在“configs/rec/abinet/abinet_resnet45_en.yaml”中model的pretrained添加预训练权重的路径。
245+
272246

273247
* 单卡训练
274248

@@ -283,14 +257,51 @@ python tools/train.py --config configs/rec/abinet/abinet_resnet45_en.yaml
283257

284258
训练结果(包括checkpoint、每个epoch的性能和曲线图)将被保存在yaml配置文件的`ckpt_save_dir`参数配置的目录下,默认为`./tmp_rec`
285259

286-
### 3.3 模型评估
260+
### 模型评估
287261

288262
若要评估已训练模型的准确性,可以使用`eval.py`。请在yaml配置文件的`eval`部分将参数`ckpt_load_path`设置为模型checkpoint的文件路径,设置`distribute`为False,然后运行:
289263

290264
```
291265
python tools/eval.py --config configs/rec/abinet/abinet_resnet45_en.yaml
292266
```
293267

268+
269+
270+
## 评估结果
271+
<!--- Guideline:
272+
Table Format:
273+
- Model: model name in lower case with _ seperator.
274+
- Top-1 and Top-5: Keep 2 digits after the decimal point.
275+
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
276+
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
277+
- Download: url of the pretrained model weights. Use absolute url path.
278+
-->
279+
280+
### 精确度
281+
根据我们的实验,在公共基准数据集(IC13、IC15、IIIT、SVT、SVTP、CUTE)上的评估结果如下:
282+
283+
<div align="center">
284+
285+
| **model name** | **backbone** | **train dataset** | **params(M)** | **cards** | **batch size** | **jit level** | **graph compile** | **ms/step** | **img/s** | **accuracy** | **recipe** | **weight** |
286+
|:--------------:| :----------: |:-----------------:| :-----------: |:---------:| :------------: |:-------------:|:-----------------:|:-----------:|:---------:| :----------: |:--------------------------------:|:------------------------------------------------------------------------------------------------:|
287+
| ABINet | Resnet45 | MJ+ST | 36.93 | 8 | 96 | O2 | 680.51 s | 115.56 | 6646.07 | 91.35% | [yaml](abinet_resnet45_en.yaml) | [ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_resnet45_en-7efa1184.ckpt) |
288+
289+
</div>
290+
291+
292+
<details open>
293+
<div align="center">
294+
<summary>每个基准数据集的详细精度结果</summary>
295+
296+
| **model name** | **backbone** | **cards** | **IC03_860** | **IC03_867** | **IC13_857** | **IC13_1015** | **IC15_1811** | **IC15_2077** | **IIIT5k_3000** | **SVT** | **SVTP** | **CUTE80** | **Average** |
297+
|:--------------:|:------------:|:---------:|:------------:|:------------:|:------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------:|:--------:|:----------:|:-----------:|
298+
| ABINet | Resnet45 | 1 | 96.22% | 95.83% | 96.48% | 94.90% | 84.38% | 80.56% | 95.83% | 92.36% | 87.33% | 89.58% | 91.35% |
299+
300+
</div>
301+
</details>
302+
303+
304+
294305
## 参考文献
295306
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
296307

0 commit comments

Comments
 (0)