-
Notifications
You must be signed in to change notification settings - Fork 122
added markdown document for ocr engine comparison #577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* **Bad**, because increases support complexity with multiple engines | ||
|
||
### Confirmation | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elaborate on how this is done. I would assume that you have the 100+ PDFs at hand and wrote a test suite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, i wrote this in advance assuming that I will have that many tested later on, but I deleted that section now. Looking at the level of detail and sophistication of the other markdowns (very little) I decided it's not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An ADR can also have TODOs and links to existing drafts of the test suite.
|
||
* Current implementation uses Tesseract 4.x with LSTM engine | ||
* In benchmarks, Google Cloud Vision shows the highest overall accuracy | ||
* Handwriting (categories 2 & 3) is the main differentiator among engines |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are these catorgies mentioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, deleted this section
|
||
The web resources that informed this ADR: | ||
|
||
1. <https://www.mdpi.com/2073-8994/12/5/715> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link that to each pro/con agrument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did that.
see JabRef/jabref#13573
@@ -0,0 +1,153 @@ | |||
# ADR-002: OCR Engine Selection for JabRef |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to follow the format given at JabRef's repo - and place it in the JabRef folder. https://github.com/JabRef/jabref/tree/main/docs/decisions
I think, this is AI generated, because I cannot explain otherwise why A) this takes number 0002 - and in the heading.
(And does not follow the MADR format)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I adjusted the format a little bit, but it was already very similar to the other md files in the folder. I restructured the heading a little bit to make it even more similar.
See the new PR here: JabRef/jabref#13573
should go to devdocs: jabref/docs/decisions |
Follow-up PR is JabRef/jabref#13573 Therefore, I close this one. |
This is related to gsoc ocr project by Kaan Erdem.
JabRef/jabref#13313