You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: configs/det/fcenet/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ FCENet is a segmentation-based text detection algorithm. In the text detection s
16
16
17
17
The idea of deformable convolution is very simple, which is to change the fixed shape of the convolution kernel into a variable one. Based on the position of the original convolution, deformable convolution will generate a random position shift, as shown in the following figure:
Figure (a) is the original convolutional kernel, Figure (b) is a deformable convolutional kernel that generates random directional position shifts, and Figure (c) and (d) are two special cases of Figure (b). It can be seen that the advantage of this is that it can improve the Geometric transformation ability of the convolution kernel, so that it is not limited to the shape of the original convolution kernel rectangle, but can support more abundant irregular shapes. Deformable convolution performs better in extracting irregular shape features [[1](#references)] and is more suitable for text recognition scenarios in natural scenes.
@@ -25,7 +25,7 @@ Figure (a) is the original convolutional kernel, Figure (b) is a deformable conv
25
25
26
26
Fourier contour is a curve fitting method based on Fourier transform. As the number of Fourier degree k increases, more high-frequency signals will be introduced, and the contour description will be more accurate. The following figure shows the ability to describe irregular curves under different Fourier degree:
Like most OCR algorithms, the structure of FCENet can be roughly divided into three parts: backbone, neck, and head. The backbone uses a deformable convolutional version of Resnet50 for feature extraction; The neck section adopts a feature pyramid [[2](#references)], which is a set of convolutional kernels of different sizes, suitable for extracting features of different sizes from the original image, thereby improving the accuracy of object detection. It suits scenes that there are a few text boxes of different sizes in one image; The head part has two branches, one is the classification branch. The classification branch predicts the heat maps of both text regions and text center regions, which are pixel-wise multiplied, resulting in the the classification score map. The loss of classification branch is calculated by the cross entropy between prediction heat maps and ground truth. The regression branch predicts the Fourier signature vectors, which are used to reconstruct text contours via the Inverse Fourier transformation (IFT). Calculate the smooth-l1 loss of the reconstructed text contour and the ground truth contour in the image space as the loss value of the regression branch.
- Context for val_while_train: Since mindspore.nn.transformer requires a fixed batchsize when defined, when choosing val_while_train=True, it is necessary to ensure that the batchsize of the validation set is the same as that of the model.
282
-
- So, line 179-185 in minocr.data.builder.py
283
-
```
284
-
if not is_train:
285
-
if drop_remainder and is_main_device:
286
-
_logger.warning(
287
-
"`drop_remainder` is forced to be False for evaluation "
288
-
"to include the last batch for accurate evaluation."
289
-
)
290
-
drop_remainder = False
291
-
292
-
```
293
-
should be changed to
294
-
```
295
-
if not is_train:
296
-
# if drop_remainder and is_main_device:
297
-
_logger.warning(
298
-
"`drop_remainder` is forced to be False for evaluation "
299
-
"to include the last batch for accurate evaluation."
300
-
)
301
-
drop_remainder = True
302
-
```
303
280
## References
304
281
<!--- Guideline: Citation format GB/T 7714 is suggested. -->
Copy file name to clipboardExpand all lines: configs/rec/crnn/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -159,7 +159,7 @@ eval:
159
159
...
160
160
```
161
161
162
-
By running `tools/eval.py` as noted in section [Model Evaluation](#33-model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
162
+
By running `tools/eval.py` as noted in section [Model Evaluation](#model-evaluation) with the above config yaml, you can get the accuracy performance on dataset CUTE80.
163
163
164
164
165
165
2. Evaluate on multiple datasets under the same folder
@@ -388,8 +388,8 @@ Experiments are tested on ascend 310P with mindspore lite 2.3.1 graph mode.
388
388
### Notes
389
389
390
390
- To reproduce the result on other contexts, please ensure the global batch size is the same.
391
-
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [4. Character Dictionary](#4-character-dictionary).
392
-
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#312-dataset-download) section.
391
+
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to [Character Dictionary](#character-dictionary).
392
+
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to [Dataset Download & Dataset Usage](#dataset-download) section.
393
393
- The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100).
0 commit comments