Proposal: WFGY “OCR→LLM robustness clinic” demo on top of Tesseract.js #1053
onestardao
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Guillermo and Tesseract.js community,
First of all, thank you for starring my WFGY repo from your personal account a few months before. I am very appreciated it.
That small signal actually kept me going. Since then I finished a 3.0 “Singularity demo” and a full Problem Map / Global Fix Map around it, all MIT-licensed and text-based.
I know maintainers get flooded with emails, so instead of pinging your inbox again, I wanted to share one focused proposal here that is directly related to Tesseract.js.
What I am working on
Very short description so you can quickly decide if this is worth any attention:
Important scope note:
So you can think of it less as a new model, and more as an experiment language + measurement protocol that sits on top of existing tools.
How this touches Tesseract.js
Several of the earliest, most painful failures I studied came from OCR-heavy pipelines:
Small OCR quirks in those setups often get amplified by downstream RAG and LLM layers.
Over time, those cases became entries in the WFGY Problem Map (e.g. hallucination & chunk drift, interpretation collapse, embedding≠semantic meaning, retrieval traceability, etc.).
Today, the map has grown beyond OCR, but OCR→LLM chains are still one of the most important “stress labs” behind it.
Because of that, I think there is an opportunity for a very small, very concrete demo that might be useful for the Tesseract.js ecosystem.
Proposal: “OCR→LLM robustness clinic” demo
I would like to build a minimal, fully reproducible demo on top of Tesseract.js that does the following:
Input
(e.g. “What is the total amount on this invoice?”).
Pipeline
(layout confusion, hallucination, semantic drift, etc.),
Output
no model weights are modified and no proprietary code is required.
For the first version, I would probably deliver this as a Colab-style notebook so people can:
If this turns out to be useful, it would be straightforward to later adapt the same idea into a small browser demo using Tesseract.js directly.
Why this might be useful (or not)
From your perspective, the potential benefits I see are:
From my side, the main benefit is being able to ground the WFGY “tension” language in real, non-trivial pipelines that many developers actually run.
If this feels off-topic for Tesseract.js or you prefer to keep Discussions focused on more concrete usage questions, feel free to close this. I completely understand.
If it seems even mildly interesting, I’m happy to:
Either way, thank you again for Tesseract.js and for the early star on WFGY.
A lot of the “semantic tension” ideas only exist because people kept pushing OCR pipelines much further than they were originally designed for.
Beta Was this translation helpful? Give feedback.
All reactions