OCR #790

ym · 2025-09-10T12:12:24Z

Pure front-end implementation of OCR using tesseract.js

Copilot

Pull Request Overview

This PR adds OCR (Optical Character Recognition) functionality to the web application, allowing users to extract text from video content using Tesseract.js.

Key changes include:

Integration of Tesseract.js OCR library with external CDN configuration
New OCR hook and modal component for text extraction from video frames
Addition of OCR button to the action bar interface

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
ui/vite.config.ts	Configures Tesseract.js as external dependency with CDN path
ui/src/hooks/useOCR.ts	Creates OCR hook with image processing functionality
ui/src/components/popovers/OCRModal.tsx	Implements OCR modal UI with progress tracking and error handling
ui/src/components/WebRTCVideo.tsx	Passes video element reference to action bar
ui/src/components/ActionBar.tsx	Adds OCR button and integrates OCR modal
ui/package.json	Updates Node.js version constraint and adds Tesseract.js dependency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-10T16:22:18Z

ui/src/hooks/useOCR.ts

+export type ImageLike = string | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement
+  | CanvasRenderingContext2D | File | Blob | OffscreenCanvas;
+
+// tesseract.js is h


Incomplete comment. Should either be completed or removed.

Suggested change

// tesseract.js is h

Copilot · 2025-09-10T16:22:18Z

ui/src/components/popovers/OCRModal.tsx

+import { Button } from "@components/Button";
+import { GridCard } from "@components/Card";
+import { TextAreaWithLabel } from "@components/TextArea";
+import { SettingsPageHeader } from "@components/SettingsPageheader";


Import path has incorrect capitalization. Should be 'SettingsPageHeader' not 'SettingsPageheader'.

Suggested change

import { SettingsPageHeader } from "@components/SettingsPageheader";

import { SettingsPageHeader } from "@components/SettingsPageHeader";

IDisposable · 2025-09-10T19:03:39Z

ui/src/components/ActionBar.tsx

+                text="OCR"
+                LeadingIcon={MdOutlineDocumentScanner}
+                onClick={() => {
+                  setDisableVideoFocusTrap(true);


Not related to this change specifically, but I wonder if we should make this trap logic be fired by the open/close of the popover itself (e.g. related to the popover being onscreen/visible, vs. hoping we caught the click).

IDisposable · 2025-09-10T19:04:54Z

ui/src/components/popovers/OCRModal.tsx

+
+    // create a canvas from the video element then capture the image from the canvas
+    const video = videoElmRef?.current;
+    const canvas = document.createElement("canvas");


Do we need to null-check the video element and bail out here?

IDisposable · 2025-09-10T19:06:07Z

ui/src/components/popovers/OCRModal.tsx

+    canvas.height = video.videoHeight;
+
+    const ctx = canvas.getContext("2d");
+    ctx?.drawImage(video, 0, 0, canvas.width, canvas.height);


Would we want to "pause" the video before and "unpause" the video after this drawImage call to ensure we get a full frame instead of a (possibly partial) live view?

IDisposable · 2025-09-10T19:07:37Z

ui/src/components/popovers/OCRModal.tsx

+    const ctx = canvas.getContext("2d");
+    ctx?.drawImage(video, 0, 0, canvas.width, canvas.height);
+
+    const text = await ocrImage(["eng"], canvas, { logger: setOcrStatus, errorHandler: handleOcrError });


Future: do we want the user to select the "scanned for" language somehow?

Yes, I also want it to be auto-detected based on the browser locale. Let's make the PR a draft, as there are too many improvements needed.

It actually looks pretty close.

IDisposable · 2025-09-10T19:10:06Z

ui/src/hooks/useOCR.ts

+  const tesseract = await import('tesseract.js')
+  const createWorker = tesseract.createWorker || tesseract.default.createWorker
+
+  const worker = await createWorker(language, undefined, options)


I've used tesseract a bunch on dom-to-image-more's testing... I found that spinning up the worker is the slowest part as it has to load the OCR models. Perhaps we can have a lazily terminated one that stays around for a bit in case they want to do repeated OCRs?

IDisposable · 2025-09-10T19:16:06Z

ui/vite.config.ts

+        external: ["tesseract.js"],
+        output: {
+          paths: {
+            "tesseract.js": "https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract.esm.min.js",


Not loving the version hardcoding here... pretty buried. I wonder if actually adding the tesseract NPM module to package.json and the exracting the version number or CDN link for use here would be more manageable?

ym force-pushed the feat/ocr branch 5 times, most recently from caaf06d to 7385c2b Compare September 10, 2025 16:07

ym added 2 commits September 10, 2025 18:08

feat: ocr

598fd96

fix package-lock

89c259a

ym force-pushed the feat/ocr branch 2 times, most recently from 0a5c907 to fc1b87a Compare September 10, 2025 16:20

add missing modal

2ff3dff

ym force-pushed the feat/ocr branch from fc1b87a to 2ff3dff Compare September 10, 2025 16:20

ym requested review from adamshiervani, IDisposable and Copilot and removed request for adamshiervani September 10, 2025 16:21

Copilot AI reviewed Sep 10, 2025

View reviewed changes

IDisposable reviewed Sep 10, 2025

View reviewed changes

ym marked this pull request as draft September 11, 2025 00:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCR #790

OCR #790

ym commented Sep 10, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 10, 2025

Uh oh!

Copilot AI Sep 10, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

ym Sep 11, 2025

Uh oh!

IDisposable Sep 11, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

IDisposable Sep 10, 2025

Uh oh!

Uh oh!

	import { SettingsPageHeader } from "@components/SettingsPageheader";
	import { SettingsPageHeader } from "@components/SettingsPageHeader";

OCR #790

Are you sure you want to change the base?

OCR #790

Conversation

ym commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ym commented Sep 10, 2025 •

edited

Loading