You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: configs/rec/abinet/README.md
+51-46Lines changed: 51 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
> [Read Like Humans: Autonomous, Bidirectional and Iterative Language
6
6
Modeling for Scene Text Recognition](https://arxiv.org/pdf/2103.06495)
7
7
8
-
## 1. Abstract
8
+
## Abstract
9
9
<!--- Guideline: Introduce the model and architectures. Cite if you use/adopt paper explanation from others. -->
10
10
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively model linguistic rules in end-to-end deep networks remains a research challenge. In this paper, we argue that the limited capacity of language models comes from: 1) implicitly language modeling; 2) unidirectional feature representation; and 3) language model with noise input. Correspondingly, we propose an autonomous, bidirectional and iterative ABINet for scene text recognition. Firstly, the autonomous suggests to block gradient flow between vision and language models to enforce explicitly language modeling. Secondly, a novel bidirectional cloze network (BCN) as the language model is proposed based on bidirectional feature representation. Thirdly, we propose an execution manner of iterative correction for language model which can effectively alleviate the impact of noise input. Additionally, based on the ensemble of iterative predictions, we propose a self-training method which can learn from unlabeled images effectively. Extensive experiments indicate that ABINet has superiority on low-quality images and achieves state-of-the-art results on several mainstream benchmarks. Besides, the ABINet trained with ensemble self-training shows promising improvement in realizing human-level recognition. [<a href="#references">1</a>]
11
11
@@ -18,51 +18,19 @@ Linguistic knowledge is of great benefit to scene text recognition. However, how
18
18
<em> Figure 1. Architecture of ABINet [<ahref="#references">1</a>] </em>
19
19
</p>
20
20
21
-
## 2. Results
22
-
<!--- Guideline:
23
-
Table Format:
24
-
- Model: model name in lower case with _ seperator.
25
-
- Top-1 and Top-5: Keep 2 digits after the decimal point.
26
-
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
27
-
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
28
-
- Download: url of the pretrained model weights. Use absolute url path.
29
-
-->
30
-
31
-
### Accuracy
32
-
33
-
According to our experiments, the evaluation results on public benchmark datasets ( IC13, IC15, IIIT, SVT, SVTP, CUTE) is as follow:
21
+
## Requirements
34
22
35
-
<details>
36
-
<summary>Performance tested on ascend 910 with graph mode</summary>
- The input Shapes of MindIR of ABINet is (1, 3, 32, 128).
57
-
58
-
59
-
## 3. Quick Start
60
-
### 3.1 Preparation
61
-
62
-
#### 3.1.1 Installation
30
+
#### Installation
63
31
Please refer to the [installation instruction](https://github.com/mindspore-lab/mindocr#installation) in MindOCR.
64
32
65
-
#### 3.1.2 Dataset Download
33
+
#### Dataset Download
66
34
Please download LMDB dataset for traininig and evaluation from
67
35
-`training` contains two datasets: [MJSynth (MJ)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ) and [SynthText (ST)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
68
36
-`evaluation` contains several benchmarking datasets, which are [IIIT](http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html), [SVT](http://www.iapr-tc11.org/mediawiki/index.php/The_Street_View_Text_Dataset), [IC13](http://rrc.cvc.uab.es/?ch=2), [IC15](http://rrc.cvc.uab.es/?ch=4), [SVTP](http://openaccess.thecvf.com/content_iccv_2013/papers/Phan_Recognizing_Text_with_2013_ICCV_paper.pdf), and [CUTE](http://cs-chan.com/downloads_CUTE80_dataset.html).
@@ -99,7 +67,7 @@ data_lmdb_release/
99
67
│ └── lock.mdb
100
68
```
101
69
102
-
#### 3.1.3 Dataset Usage
70
+
#### Dataset Usage
103
71
104
72
Here we used the datasets under `train/` folders for **train**. After training, we used the datasets under `evaluation/` to evluation model accuracy.
105
73
@@ -200,7 +168,7 @@ data_lmdb_release/
200
168
then you can evaluate on each dataset by modifying the config yaml as follows, and execute the script `tools/benchmarking/multi_dataset_eval.py`.
201
169
202
170
203
-
#### 3.1.4 Check YAML Config Files
171
+
#### Check YAML Config Files
204
172
Apart from the dataset setting, please also check the following important args: `system.distribute`, `system.val_while_train`, `common.batch_size`, `train.ckpt_save_dir`, `train.dataset.dataset_root`, `train.dataset.data_dir`, `train.dataset.label_file`,
205
173
`eval.ckpt_load_path`, `eval.dataset.dataset_root`, `eval.dataset.data_dir`, `eval.dataset.label_file`, `eval.loader.batch_size`. Explanations of these important args:
206
174
@@ -244,7 +212,7 @@ eval:
244
212
- Dataset: The MJSynth and SynthText datasets come from [ABINet_repo](https://github.com/FangShancheng/ABINet).
245
213
246
214
247
-
### 3.2 Model Training
215
+
### Model Training
248
216
<!--- Guideline: Avoid using shell script in the command line. Python script preferred. -->
249
217
250
218
* Distributed Training
@@ -256,7 +224,7 @@ It is easy to reproduce the reported results with the pre-defined training recip
The pre-trained model needs to be loaded during ABINet model training, and the weight of the pre-trained model is
259
-
from https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt. It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml".
227
+
from [abinet_pretrain_en.ckpt](https://download.mindspore.cn/toolkits/mindocr/abinet/abinet_pretrain_en-821ca20b.ckpt). It is needed to add the path of the pretrained weight to the model pretrained in "configs/rec/abinet/abinet_resnet45_en.yaml".
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`. The default directory is `./tmp_rec`.
271
239
272
-
### 3.3 Model Evaluation
240
+
### Model Evaluation
273
241
274
242
To evaluate the accuracy of the trained model, you can use `eval.py`. Please set the checkpoint path to the arg `ckpt_load_path` in the `eval` section of yaml config file, set `distribute` to be False, and then run:
- Model: model name in lower case with _ seperator.
25
-
- Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
26
-
- Top-1 and Top-5: Keep 2 digits after the decimal point.
27
-
- Params (M): # of model parameters in millions (10^6). Keep 2 digits after the decimal point
28
-
- Recipe: Training recipe/configuration linked to a yaml config file. Use absolute url path.
29
-
- Download: url of the pretrained model weights. Use absolute url path.
0 commit comments