Skip to content

Conversation

ym
Copy link
Contributor

@ym ym commented Sep 10, 2025

Pure front-end implementation of OCR using tesseract.js

image

@ym ym force-pushed the feat/ocr branch 5 times, most recently from caaf06d to 7385c2b Compare September 10, 2025 16:07
@ym ym force-pushed the feat/ocr branch 2 times, most recently from 0a5c907 to fc1b87a Compare September 10, 2025 16:20
@ym ym requested review from adamshiervani, IDisposable and Copilot and removed request for adamshiervani September 10, 2025 16:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds OCR (Optical Character Recognition) functionality to the web application, allowing users to extract text from video content using Tesseract.js.

Key changes include:

  • Integration of Tesseract.js OCR library with external CDN configuration
  • New OCR hook and modal component for text extraction from video frames
  • Addition of OCR button to the action bar interface

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
ui/vite.config.ts Configures Tesseract.js as external dependency with CDN path
ui/src/hooks/useOCR.ts Creates OCR hook with image processing functionality
ui/src/components/popovers/OCRModal.tsx Implements OCR modal UI with progress tracking and error handling
ui/src/components/WebRTCVideo.tsx Passes video element reference to action bar
ui/src/components/ActionBar.tsx Adds OCR button and integrates OCR modal
ui/package.json Updates Node.js version constraint and adds Tesseract.js dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

export type ImageLike = string | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement
| CanvasRenderingContext2D | File | Blob | OffscreenCanvas;

// tesseract.js is h
Copy link
Preview

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete comment. Should either be completed or removed.

Suggested change
// tesseract.js is h

Copilot uses AI. Check for mistakes.

import { Button } from "@components/Button";
import { GridCard } from "@components/Card";
import { TextAreaWithLabel } from "@components/TextArea";
import { SettingsPageHeader } from "@components/SettingsPageheader";
Copy link
Preview

Copilot AI Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import path has incorrect capitalization. Should be 'SettingsPageHeader' not 'SettingsPageheader'.

Suggested change
import { SettingsPageHeader } from "@components/SettingsPageheader";
import { SettingsPageHeader } from "@components/SettingsPageHeader";

Copilot uses AI. Check for mistakes.

text="OCR"
LeadingIcon={MdOutlineDocumentScanner}
onClick={() => {
setDisableVideoFocusTrap(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change specifically, but I wonder if we should make this trap logic be fired by the open/close of the popover itself (e.g. related to the popover being onscreen/visible, vs. hoping we caught the click).


// create a canvas from the video element then capture the image from the canvas
const video = videoElmRef?.current;
const canvas = document.createElement("canvas");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to null-check the video element and bail out here?

canvas.height = video.videoHeight;

const ctx = canvas.getContext("2d");
ctx?.drawImage(video, 0, 0, canvas.width, canvas.height);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to "pause" the video before and "unpause" the video after this drawImage call to ensure we get a full frame instead of a (possibly partial) live view?

const ctx = canvas.getContext("2d");
ctx?.drawImage(video, 0, 0, canvas.width, canvas.height);

const text = await ocrImage(["eng"], canvas, { logger: setOcrStatus, errorHandler: handleOcrError });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: do we want the user to select the "scanned for" language somehow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I also want it to be auto-detected based on the browser locale. Let's make the PR a draft, as there are too many improvements needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually looks pretty close.

const tesseract = await import('tesseract.js')
const createWorker = tesseract.createWorker || tesseract.default.createWorker

const worker = await createWorker(language, undefined, options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used tesseract a bunch on dom-to-image-more's testing... I found that spinning up the worker is the slowest part as it has to load the OCR models. Perhaps we can have a lazily terminated one that stays around for a bit in case they want to do repeated OCRs?

external: ["tesseract.js"],
output: {
paths: {
"tesseract.js": "https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract.esm.min.js",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not loving the version hardcoding here... pretty buried. I wonder if actually adding the tesseract NPM module to package.json and the exracting the version number or CDN link for use here would be more manageable?

@ym ym marked this pull request as draft September 11, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants