Skip to content

Commit 45ec799

Browse files
authored
Merge pull request #2917 from jnzw/handwritten-english-recognition-0001
Contribute the handwritten-english-recognition-0001 model
2 parents f67162b + 0a9d29a commit 45ec799

File tree

14 files changed

+313
-9
lines changed

14 files changed

+313
-9
lines changed

data/dataset_classes/gnhk.txt

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
2+
!
3+
"
4+
#
5+
$
6+
%
7+
&
8+
'
9+
(
10+
)
11+
*
12+
+
13+
,
14+
-
15+
.
16+
/
17+
0
18+
1
19+
2
20+
3
21+
4
22+
5
23+
6
24+
7
25+
8
26+
9
27+
:
28+
;
29+
<
30+
=
31+
>
32+
?
33+
@
34+
A
35+
B
36+
C
37+
D
38+
E
39+
F
40+
G
41+
H
42+
I
43+
J
44+
K
45+
L
46+
M
47+
N
48+
O
49+
P
50+
Q
51+
R
52+
S
53+
T
54+
U
55+
V
56+
W
57+
X
58+
Y
59+
Z
60+
[
61+
]
62+
^
63+
_
64+
a
65+
b
66+
c
67+
d
68+
e
69+
f
70+
g
71+
h
72+
i
73+
j
74+
k
75+
l
76+
m
77+
n
78+
o
79+
p
80+
q
81+
r
82+
s
83+
t
84+
u
85+
v
86+
w
87+
x
88+
y
89+
z
90+
{
91+
|
92+
}
93+
~
94+
£

data/dataset_definitions.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1200,6 +1200,15 @@ datasets:
12001200
annotation: scut_ept_recognition.pickle
12011201
dataset_meta: scut_ept_recognition.json
12021202

1203+
- name: GNHK
1204+
data_source: gnhk
1205+
annotation_conversion:
1206+
converter: unicode_character_recognition
1207+
decoding_char_file: gnhk_char_list.txt
1208+
annotation_file: test_img_id_gt.txt
1209+
annotation: gnhk_recognition.pickle
1210+
dataset_meta: gnhk_recognition.json
1211+
12031212
- name: ADEChallengeData2016
12041213
annotation_conversion:
12051214
converter: ade20k

demos/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The Open Model Zoo includes the following demos:
2727
- [Gaze Estimation C++ G-API Demo](./gaze_estimation_demo/cpp_gapi/README.md) - Face detection followed by gaze estimation, head pose estimation and facial landmarks regression. G-API version.
2828
- [Gesture Recognition Python\* Demo](./gesture_recognition_demo/python/README.md) - Demo application for Gesture Recognition algorithm (e.g. American Sign Language gestures), which classifies gesture actions that are being performed on input video.
2929
- [GPT-2 Text Prediction Python\* Demo](./gpt2_text_prediction_demo/python/README.md) - GPT-2 text prediction demo.
30-
- [Handwritten Text Recognition Python\* Demo](./handwritten_text_recognition_demo/python/README.md) - The demo demonstrates how to run Handwritten Japanese Recognition models and Handwritten Simplified Chinese Recognition models.
30+
- [Handwritten Text Recognition Python\* Demo](./handwritten_text_recognition_demo/python/README.md) - The demo demonstrates how to run Handwritten Text Recognition models for Japanese, Simplified Chinese and English.
3131
- [Human Pose Estimation C++ Demo](./human_pose_estimation_demo/cpp/README.md) - Human pose estimation demo.
3232
- [Human Pose Estimation Python\* Demo](./human_pose_estimation_demo/python/README.md) - Human pose estimation demo.
3333
- [Image Inpainting Python\* Demo](./image_inpainting_demo/python/README.md) - Demo application for GMCNN inpainting network.

demos/handwritten_text_recognition_demo/python/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Handwritten Text Recognition Demo
22

3-
This example demonstrates an approach to recognize handwritten Japanese and simplified Chinese text lines using OpenVINO™. For Japanese, this demo supports all the characters in datasets [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html). For simplified Chinese, it supports the characters in [SCUT-EPT](https://github.com/HCIILAB/SCUT-EPT_Dataset_Release).
4-
3+
This example demonstrates an approach to recognize handwritten Japanese, simplified Chinese, and English text lines using OpenVINO™. For Japanese, this demo supports all the characters in datasets [Kondate](http://web.tuat.ac.jp/~nakagawa/database/en/kondate_about.html) and [Nakayosi](http://web.tuat.ac.jp/~nakagawa/database/en/about_nakayosi.html). For simplified Chinese, it supports the characters in [SCUT-EPT](https://github.com/HCIILAB/SCUT-EPT_Dataset_Release). For English, it supports the characters in [GNHK](https://goodnotes.com/gnhk/).
54
## How It Works
65

76
The demo workflow is the following:
@@ -29,6 +28,7 @@ omz_converter --list models.lst
2928

3029
* handwritten-japanese-recognition-0001
3130
* handwritten-simplified-chinese-recognition-0001
31+
* handwritten-english-recognition-0001
3232

3333
> **NOTE**: Refer to the tables [Intel's Pre-Trained Models Device Support](../../../models/intel/device_support.md) and [Public Pre-Trained Models Device Support](../../../models/public/device_support.md) for the details on models inference support at different devices.
3434
@@ -76,9 +76,11 @@ Options:
7676
-tk TOP_K, --top_k TOP_K
7777
Optional. Top k steps in looking up the decoded
7878
character, until a designated one is found
79+
-ob OUTPUT_BLOB, --output_blob OUTPUT_BLOB
80+
Optional. Name of the output layer of the model. Default is 'output'
7981
```
8082

81-
The decoding char list files provided within Open Model Zoo and for Japanese it is the `<omz_dir>/data/dataset_classes/kondate_nakayosi.txt`file, while for Simplified Chinese it is the `<omz_dir>/data/dataset_classes/scut_ept.txt` file. For example, to do inference on a CPU with the OpenVINO&trade; toolkit pre-trained `handwritten-japanese-recognition-0001` model, run the following command:
83+
The decoding char list files provided within Open Model Zoo and for Japanese it is the `<omz_dir>/data/dataset_classes/kondate_nakayosi.txt` file, while for Simplified Chinese it is the `<omz_dir>/data/dataset_classes/scut_ept.txt` file, and for English it is the `<omz_dir>/data/dataset_classes/gnhk.txt` file. For example, to do inference on a CPU with the OpenVINO&trade; toolkit pre-trained `handwritten-japanese-recognition-0001` model, run the following command:
8284

8385
```sh
8486
python handwritten_text_recognition_demo.py \
119 KB
Loading

demos/handwritten_text_recognition_demo/python/handwritten_text_recognition_demo.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ def build_argparser():
4848
help="Path to the decoding char list file. Default is for Japanese")
4949
args.add_argument("-dc", "--designated_characters", type=str, default=None, help="Optional. Path to the designated character file")
5050
args.add_argument("-tk", "--top_k", type=int, default=20, help="Optional. Top k steps in looking up the decoded character, until a designated one is found")
51+
args.add_argument("-ob", "--output_blob", type=str, default=None, help="Optional. Name of the output layer of the model. Default is None, in which case the demo will read the output name from the model, assuming there is only 1 output layer")
5152
return parser
5253

5354

@@ -77,16 +78,20 @@ def main():
7778
log.info('OpenVINO Inference Engine')
7879
log.info('\tbuild: {}'.format(get_version()))
7980
ie = IECore()
81+
ie.set_config(config={"GPU_ENABLE_LOOP_UNROLLING": "NO", "CACHE_DIR": "./"}, device_name="GPU")
8082

8183
# Read IR
8284
log.info('Reading model {}'.format(args.model))
8385
net = ie.read_network(args.model, os.path.splitext(args.model)[0] + ".bin")
8486

8587
assert len(net.input_info) == 1, "Demo supports only single input topologies"
86-
assert len(net.outputs) == 1, "Demo supports only single output topologies"
87-
8888
input_blob = next(iter(net.input_info))
89-
out_blob = next(iter(net.outputs))
89+
90+
if args.output_blob is not None:
91+
out_blob = args.output_blob
92+
else:
93+
assert len(net.outputs) == 1, "Demo supports only single output topologies"
94+
out_blob = next(iter(net.output_info))
9095

9196
characters = get_characters(args)
9297
codec = CTCCodec(characters, args.designated_characters, args.top_k)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
# This file can be used with the --list option of the model downloader.
22
handwritten-japanese-recognition-0001
33
handwritten-simplified-chinese-recognition-0001
4+
handwritten-english-recognition-0001
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# handwritten-english-recognition-0001
2+
3+
## Use Case and High-Level Description
4+
5+
This is a network for handwritten English text recognition scenario. It consists of a CNN followed by Bi-LSTM, reshape layer and a fully connected layer.
6+
The network is able to recognize English text consisting of characters in the [GNHK](https://goodnotes.com/gnhk/) dataset.
7+
8+
## Example
9+
10+
![](./assets/handwritten-english-recognition-0001.jpg) -> 'Picture ID. and Passport photo'
11+
12+
## Specification
13+
14+
| Metric | Value |
15+
| ------------------------- | --------- |
16+
| GFlops | 1.3182 |
17+
| MParams | 0.1413 |
18+
| Accuracy on GNHK test subset (excluding images wider than 2000px after resized to height 96px with aspect ratio) | 81.5% |
19+
| Source framework | PyTorch\* |
20+
21+
> **Note:** to achieve the accuracy, images from the GNHK test set should be binarized using adaptive thresholding, and preprocessed into single-line text images, using the coordinates from the accompanying JSON annotation files in the GNHK dataset. See [preprocess_gnhk.py](./preprocess_gnhk.py).
22+
23+
This model adopts [label error rate](https://dl.acm.org/doi/abs/10.1145/1143844.1143891) as the metric for accuracy.
24+
25+
## Inputs
26+
27+
Grayscale image, name - `actual_input`, shape - `1, 1, 96, 2000`, format is `B, C, H, W`, where:
28+
29+
- `B` - batch size
30+
- `C` - number of channels
31+
- `H` - image height
32+
- `W` - image width
33+
34+
> **NOTE:** the source image should be resized to specific height (such as 96) while keeping aspect ratio, and the width after resizing should be no larger than 2000 and then the width should be right-bottom padded to 2000 with edge values.
35+
36+
## Outputs
37+
38+
Name - `output`, shape - `250, 1, 95`, format is `W, B, L`, where:
39+
40+
- `W` - output sequence length
41+
- `B` - batch size
42+
- `L` - confidence distribution across the supported symbols in [GNHK](https://goodnotes.com/gnhk/)
43+
44+
The network output can be decoded by CTC Greedy Decoder.
45+
46+
The network also outputs 10 LSTM hidden states of shape `2, 1, 256`, which can be simply ignored.
47+
48+
## Legal Information
49+
50+
[*] Other names and brands may be claimed as the property of others.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
models:
2+
- name: handwritten-english-recognition-0001
3+
4+
launchers:
5+
- framework: dlsdk
6+
adapter:
7+
type: ctc_greedy_search_decoder
8+
logits_output: output
9+
10+
datasets:
11+
- name: GNHK
12+
# In order to be used by model, images must be:
13+
# 1) Resized to fixed height 96, using AREA interpolation
14+
# 2) Padded with edge padding, right padding only
15+
preprocessing:
16+
- type: bgr_to_gray
17+
- type: resize
18+
interpolation: AREA
19+
aspect_ratio_scale: width
20+
size: 96
21+
- type: padding
22+
use_numpy: True
23+
numpy_pad_mode: edge
24+
dst_height: 96
25+
dst_width: 2000
26+
pad_type: right_bottom
27+
28+
metrics:
29+
- type: label_level_recognition_accuracy
30+
reference: 0.815
31+
119 KB
Loading

0 commit comments

Comments
 (0)