Proposal: WFGY “OCR→LLM robustness clinic” demo on top of Tesseract.js #1053

onestardao · 2026-02-06T07:28:52Z

onestardao
Feb 6, 2026

Hi Guillermo and Tesseract.js community,

First of all, thank you for starring my WFGY repo from your personal account a few months before. I am very appreciated it.
That small signal actually kept me going. Since then I finished a 3.0 “Singularity demo” and a full Problem Map / Global Fix Map around it, all MIT-licensed and text-based.

I know maintainers get flooded with emails, so instead of pinging your inbox again, I wanted to share one focused proposal here that is directly related to Tesseract.js.

What I am working on

Very short description so you can quickly decide if this is worth any attention:

WFGY is an open, text-only “semantic firewall + lab” for AI systems.
It started from debugging real OCR → RAG → LLM pipelines, many of them using Tesseract or Tesseract.js.
The main output today is:
- a Problem Map of failure modes (hallucination, chunk drift, layout collapse, embedding≠meaning, etc.), and
- a small set of scalar “tension” metrics at the effective layer that measure how badly a pipeline drifts or recovers on stress tests.

Important scope note:

WFGY does not touch model weights or real embedding vectors.
All metrics live at the behavioural/effective layer, using LLM outputs and structured prompts only.

So you can think of it less as a new model, and more as an experiment language + measurement protocol that sits on top of existing tools.

How this touches Tesseract.js

Several of the earliest, most painful failures I studied came from OCR-heavy pipelines:

PDFs and scanned documents with complex layout,
math, tables, multilingual text,
prompt injection and “hidden” instructions in screenshots.

Small OCR quirks in those setups often get amplified by downstream RAG and LLM layers.
Over time, those cases became entries in the WFGY Problem Map (e.g. hallucination & chunk drift, interpretation collapse, embedding≠semantic meaning, retrieval traceability, etc.).

Today, the map has grown beyond OCR, but OCR→LLM chains are still one of the most important “stress labs” behind it.

Because of that, I think there is an opportunity for a very small, very concrete demo that might be useful for the Tesseract.js ecosystem.

Proposal: “OCR→LLM robustness clinic” demo

I would like to build a minimal, fully reproducible demo on top of Tesseract.js that does the following:

Input
- A small set of public example images / PDFs (could reuse existing Tesseract.js examples).
- For each image, a short ground-truth text and 1–2 questions
  (e.g. “What is the total amount on this invoice?”).
Pipeline
- Run Tesseract.js (or standard Tesseract in a Colab-style notebook for the first prototype) to get OCR text.
- Feed the OCR text plus the question into an LLM.
- Use a fixed WFGY 3.0 TXT pack at the effective layer to:
  - classify the failure mode when things go wrong
    (layout confusion, hallucination, semantic drift, etc.),
  - assign a small set of “tension” scores (drift, stability, self-recovery),
  - and log everything in a simple, auditable format.
Output
- A single notebook / script that:
  - prints per-example rows with:
    - image id,
    - OCR text vs ground truth,
    - LLM answer vs expected answer,
    - failure category (from the Problem Map),
    - tension metrics;
  - plus a short summary: which kinds of failures dominate this batch.
- Everything runs on top of Tesseract.js / Tesseract only;
  no model weights are modified and no proprietary code is required.

For the first version, I would probably deliver this as a Colab-style notebook so people can:

see the entire pipeline end-to-end,
change images or questions,
and verify that the metrics are stable under different LLM backends.

If this turns out to be useful, it would be straightforward to later adapt the same idea into a small browser demo using Tesseract.js directly.

Why this might be useful (or not)

From your perspective, the potential benefits I see are:

A public, vendor-neutral “error atlas” for OCR→LLM pipelines that your users can experiment with.
A way to separate “OCR issues” from “downstream semantic issues” in a more structured way, without changing Tesseract.js itself.
An additional story for Tesseract.js in the LLM era:
- not only “it gets words out of images”,
- but also “we have an open lab that shows how those words behave when fed into large models”.

From my side, the main benefit is being able to ground the WFGY “tension” language in real, non-trivial pipelines that many developers actually run.

If this feels off-topic for Tesseract.js or you prefer to keep Discussions focused on more concrete usage questions, feel free to close this. I completely understand.

If it seems even mildly interesting, I’m happy to:

prepare one small, Tesseract-focused notebook demo and share it here, and/or
adapt some existing WFGY OCR-related cases into a format that is easier for Tesseract.js users to inspect and critique.

Either way, thank you again for Tesseract.js and for the early star on WFGY.
A lot of the “semantic tension” ideas only exist because people kept pushing OCR pipelines much further than they were originally designed for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: WFGY “OCR→LLM robustness clinic” demo on top of Tesseract.js #1053

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Proposal: WFGY “OCR→LLM robustness clinic” demo on top of Tesseract.js #1053

Uh oh!

onestardao Feb 6, 2026

What I am working on

How this touches Tesseract.js

Proposal: “OCR→LLM robustness clinic” demo

Why this might be useful (or not)

Replies: 0 comments

onestardao
Feb 6, 2026