|
| 1 | +# Offline handwritten mathematical expression regnition via stroke extraction |
| 2 | + |
| 3 | +The repository provide a proof-of-concept stroke extractor that can extract strokes from clean |
| 4 | +bitmap images. The stroke extractor can be used to recognize offline handwritten |
| 5 | +mathematical expression if a online recognizer is given. For example, when combined |
| 6 | +with MyScript, the resulting offline recognition system was **ranked #3 in the offline |
| 7 | +task in CROHME 2019.** |
| 8 | + |
| 9 | +## Accuracy |
| 10 | + |
| 11 | +Dataset|Correct|Up to 1 error|Up to 2 errors|Structural correct |
| 12 | +---|---|---|---|--- |
| 13 | +CROHME 2014|58.22%|71.60%|75.15%|77.38% |
| 14 | +CROHME 2016|65.65%|77.68%|82.56%|85.00% |
| 15 | +CROHME 2019|65.05%||| |
| 16 | + |
| 17 | +Although good accuracy is achieved on datasets from CROHME, the program |
| 18 | +may produce poor results on real world images. For example, the procedure do not |
| 19 | +work well on the following images: |
| 20 | +- Image containing other objects. An image should contains exactly one formula and nothing else. |
| 21 | +Ordinary text and grid lines are not allowed. |
| 22 | +- Image with low contrast. The strokes may not be distinguished from background properly. |
| 23 | +- Image with low resolution. The stroke extractor may not segment touching symbols correctly. |
| 24 | +- Printed mathematical expressions. Serifs can distract the stroke extractor. |
| 25 | + |
| 26 | +## Usage |
| 27 | + |
| 28 | +In order to use the MyScript Cloud recognition engine, you need to [create a account](https://sso.myscript.com/register) |
| 29 | +and create an application. |
| 30 | + |
| 31 | +### Graphical interface |
| 32 | + |
| 33 | +1. Run the JAR by double click or command like `java -jar mathocr-myscript.jar` |
| 34 | +2. Choose `Image file` from the menu `Recognize` |
| 35 | +3. Choose the image file |
| 36 | +4. Click the button `Recognize` under stroke preview |
| 37 | + |
| 38 | +### API |
| 39 | + |
| 40 | +First add the jar file to classpath. If you are using Maven, add the following |
| 41 | +to you `pom.xml`: |
| 42 | + |
| 43 | +```xml |
| 44 | +<dependency> |
| 45 | + <groupId>com.github.chungkwong</groupId> |
| 46 | + <artifactId>mathocr-myscript</artifactId> |
| 47 | + <version>1.0</version> |
| 48 | +</dependency> |
| 49 | +``` |
| 50 | + |
| 51 | +Then you can recognize images of mathematical expression by using code like: |
| 52 | + |
| 53 | +```java |
| 54 | +String applicationKey="your application key for MyScript"; |
| 55 | +String hmacKey="hmac key of your Myscript account"; |
| 56 | +String grammarId="an uploaded grammar of your Myscript account"; |
| 57 | +int dpi=96; |
| 58 | +MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi); |
| 59 | +Extractor extractor=new Extractor(myscriptRecognizer); |
| 60 | + |
| 61 | +File file=new File("Path to file to be recognized"); |
| 62 | +EncodedExpression expression=extractor.recognize(ImageIO.read(file)); |
| 63 | +String latexCode=expression.getCodes(new LatexFormat()); |
| 64 | +``` |
| 65 | + |
| 66 | +# 基于笔划提取的脱机手写数学公式识别 |
| 67 | + |
| 68 | +本项目提供一个可从清晰的图片中还原笔划信息的程序原型。与联机手写数学公式识别结合的话, |
| 69 | +可以打造出脱机数学公式识别系统。例如与MyScript结合时 **在CROHME 2019的脱机任务中位列第3名**。 |
| 70 | + |
| 71 | +## 准确度 |
| 72 | + |
| 73 | +数据集|正确|至多一处错误|至多两处错误|结构正确 |
| 74 | +---|---|---|---|--- |
| 75 | +CROHME 2014|58.22%|71.60%|75.15%|77.38% |
| 76 | +CROHME 2016|65.65%|77.68%|82.56%|85.00% |
| 77 | +CROHME 2019|65.05%||| |
| 78 | + |
| 79 | +虽然在CROHME数据集上取得了良好的表现,本程序对现实世界中的图片表现仍然可能未如理想。 |
| 80 | +例如对以下类型的图片可能给出差劲的结果: |
| 81 | + |
| 82 | +- 含有其它对象的图片。图片中只应含有一条公式而没有其它东西,不能有普通文本或网格之类。 |
| 83 | +- 低对比度图片。这时笔划难以从背景区分出来。 |
| 84 | +- 低清晰度图片。这时粘连在一起的符号难以分割。 |
| 85 | +- 印刷体数学公式。衬线会干扰笔划提取。 |
| 86 | + |
| 87 | +## 用法 |
| 88 | + |
| 89 | +如果使用MyScript Cloud作为联机手写数学公式识别器,请[注册一个帐号](https://sso.myscript.com/register)并创建一个应用。 |
| 90 | + |
| 91 | +### 图形用户界面 |
| 92 | + |
| 93 | + |
| 94 | +1. 通过双击或命令如`java -jar mathocr-myscript.jar`运行JAR文件 |
| 95 | +2. 在菜单`识别`中选择`图片文件` |
| 96 | +3. 选择图像文件 |
| 97 | +4. 点击笔划预览下的`识别`按钮(首次使用时需要输入你的MyScript Cloud应用标识和密钥) |
| 98 | + |
| 99 | +### API |
| 100 | + |
| 101 | +首先把JAR文件加到类路径。如果你使用Maven,把以下依赖加到`pom.xml`中`dependencies`下即可(其它构建系统类似): |
| 102 | + |
| 103 | +```xml |
| 104 | +<dependency> |
| 105 | + <groupId>com.github.chungkwong</groupId> |
| 106 | + <artifactId>mathocr-myscript</artifactId> |
| 107 | + <version>1.0</version> |
| 108 | +</dependency> |
| 109 | +``` |
| 110 | + |
| 111 | +然后你可以使用以下样子的代码识别脱机手写数学公式: |
| 112 | + |
| 113 | +```java |
| 114 | +String applicationKey="your application key for MyScript"; |
| 115 | +String hmacKey="hmac key of your Myscript account"; |
| 116 | +String grammarId="an uploaded grammar of your Myscript account"; |
| 117 | +int dpi=96; |
| 118 | +MyscriptRecognizer myscriptRecognizer=new MyscriptRecognizer(applicationKey,hmacKey,grammarId,dpi); |
| 119 | +Extractor extractor=new Extractor(myscriptRecognizer); |
| 120 | + |
| 121 | +File file=new File("Path to file to be recognized"); |
| 122 | +EncodedExpression expression=extractor.recognize(ImageIO.read(file)); |
| 123 | +String latexCode=expression.getCodes(new LatexFormat()); |
| 124 | +``` |
0 commit comments