-
Notifications
You must be signed in to change notification settings - Fork 208
OCR #790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
caaf06d
to
7385c2b
Compare
0a5c907
to
fc1b87a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds OCR (Optical Character Recognition) functionality to the web application, allowing users to extract text from video content using Tesseract.js.
Key changes include:
- Integration of Tesseract.js OCR library with external CDN configuration
- New OCR hook and modal component for text extraction from video frames
- Addition of OCR button to the action bar interface
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
ui/vite.config.ts | Configures Tesseract.js as external dependency with CDN path |
ui/src/hooks/useOCR.ts | Creates OCR hook with image processing functionality |
ui/src/components/popovers/OCRModal.tsx | Implements OCR modal UI with progress tracking and error handling |
ui/src/components/WebRTCVideo.tsx | Passes video element reference to action bar |
ui/src/components/ActionBar.tsx | Adds OCR button and integrates OCR modal |
ui/package.json | Updates Node.js version constraint and adds Tesseract.js dependency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
export type ImageLike = string | HTMLImageElement | HTMLCanvasElement | HTMLVideoElement | ||
| CanvasRenderingContext2D | File | Blob | OffscreenCanvas; | ||
|
||
// tesseract.js is h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete comment. Should either be completed or removed.
// tesseract.js is h |
Copilot uses AI. Check for mistakes.
import { Button } from "@components/Button"; | ||
import { GridCard } from "@components/Card"; | ||
import { TextAreaWithLabel } from "@components/TextArea"; | ||
import { SettingsPageHeader } from "@components/SettingsPageheader"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import path has incorrect capitalization. Should be 'SettingsPageHeader' not 'SettingsPageheader'.
import { SettingsPageHeader } from "@components/SettingsPageheader"; | |
import { SettingsPageHeader } from "@components/SettingsPageHeader"; |
Copilot uses AI. Check for mistakes.
text="OCR" | ||
LeadingIcon={MdOutlineDocumentScanner} | ||
onClick={() => { | ||
setDisableVideoFocusTrap(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this change specifically, but I wonder if we should make this trap logic be fired by the open/close of the popover itself (e.g. related to the popover being onscreen/visible, vs. hoping we caught the click).
|
||
// create a canvas from the video element then capture the image from the canvas | ||
const video = videoElmRef?.current; | ||
const canvas = document.createElement("canvas"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to null-check the video
element and bail out here?
canvas.height = video.videoHeight; | ||
|
||
const ctx = canvas.getContext("2d"); | ||
ctx?.drawImage(video, 0, 0, canvas.width, canvas.height); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want to "pause" the video before and "unpause" the video after this drawImage
call to ensure we get a full frame instead of a (possibly partial) live view?
const ctx = canvas.getContext("2d"); | ||
ctx?.drawImage(video, 0, 0, canvas.width, canvas.height); | ||
|
||
const text = await ocrImage(["eng"], canvas, { logger: setOcrStatus, errorHandler: handleOcrError }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future: do we want the user to select the "scanned for" language somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I also want it to be auto-detected based on the browser locale. Let's make the PR a draft, as there are too many improvements needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually looks pretty close.
const tesseract = await import('tesseract.js') | ||
const createWorker = tesseract.createWorker || tesseract.default.createWorker | ||
|
||
const worker = await createWorker(language, undefined, options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used tesseract a bunch on dom-to-image-more's testing... I found that spinning up the worker is the slowest part as it has to load the OCR models. Perhaps we can have a lazily terminated one that stays around for a bit in case they want to do repeated OCRs?
external: ["tesseract.js"], | ||
output: { | ||
paths: { | ||
"tesseract.js": "https://cdn.jsdelivr.net/npm/[email protected]/dist/tesseract.esm.min.js", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not loving the version hardcoding here... pretty buried. I wonder if actually adding the tesseract NPM module to package.json and the exracting the version number or CDN link for use here would be more manageable?
Pure front-end implementation of OCR using tesseract.js