Memory (RSS) keeps growing when reusing a single worker with intermittent recognize() calls

**Tesseract.js version**

tesseract.js v7.0.0 (latest npm release at time of writing)

**Describe the bug**

When using tesseract.js v7 in a long-running Node.js backend, calling worker.recognize() on image buffers causes continuous native memory growth (RSS / external / arrayBuffers), eventually exhausting system memory and swap.

This happens even though:

- A single worker is created once and reused

- OCR is not continuous: it is only used until a timestamp is successfully extracted, then not used for ~15 minutes, and only invoked again to refresh the offset

- The JavaScript heap remains stable

- The worker is not recreated

- OCR is skipped entirely most of the time once a valid timestamp offset is cached

Disabling tesseract.js completely eliminates the memory growth, which strongly suggests a native / WASM memory leak or unreleased buffers inside tesseract.js or its dependencies.

**To Reproduce**

- Create a Node.js application

- Create a single Tesseract worker once at startup

- Repeatedly call worker.recognize() on image buffers for a short period

- Stop calling OCR for several minutes

- Observe that memory does not return to baseline

- Resume OCR later and observe further memory growth

- Monitor process RSS / external memory over time

Simplified reproduction pattern:
``` import { createWorker } from 'tesseract.js';
import sharp from 'sharp';

const worker = await createWorker('eng');
await worker.setParameters({
  tessedit_char_whitelist: '0123456789:/- ',
});

async function runOcr(frame: Buffer) {
  const buffer = await sharp(frame)
    .extract({ left: 0, top: 0, width: 300, height: 80 })
    .grayscale()
    .normalize()
    .threshold(180)
    .toBuffer();

  await worker.recognize(buffer);
}

// OCR is called until timestamp is extracted,
// then skipped for ~15 minutes, then called again 
```

Memory monitoring used to confirm the issue:
``` setInterval(() => {
  const usage = process.memoryUsage();
  console.log(
    `Memory Usage: RSS=${(usage.rss / 1024 / 1024).toFixed(2)}MB, ` +
    `HeapUsed=${(usage.heapUsed / 1024 / 1024).toFixed(2)}MB, ` +
    `arrayBuffersUsed=${(usage.arrayBuffers / 1024 / 1024).toFixed(2)}MB, ` +
    `externalUsed=${(usage.external / 1024 / 1024).toFixed(2)}MB`,
  );
}, 60000); 
```


**Observed behavior:**

- heapUsed stays relatively flat

- RSS, external, and arrayBuffers grow steadily

- Memory is not reclaimed during long periods where OCR is not used

- No specific image is required; the issue reproduces with small cropped grayscale buffers from camera frames.

Expected behavior

- Native memory usage should stabilize after repeated recognize() calls

Memory should not grow when the worker is idle for long periods

Reusing a single worker intermittently should not cause unbounded RSS growth

Memory should be reused internally or released back to the OS

**Device Version**

OS: Debian 12 (Bookworm)

Node.js: Node v22.20.0

**Additional context**

- This runs in a long-lived backend service (NestJS)

- OCR is used only to extract a timestamp overlay from camera frames

- Once the timestamp offset is successfully extracted, OCR is disabled and the cached offset is reused

- Despite long idle periods, memory is not released

- Over time this leads to system swap exhaustion

- The worker is created once and only terminated on application shutdown

- Concurrency / usage pattern note

  In the real application, OCR is not awaited in the video frame pipeline in order to avoid blocking frame processing.
  Instead, OCR runs asynchronously to update a shared timestamp offset stored in application state. The frame pipeline always reads the latest known offset (initially set to the local machine time) and continues processing frames without waiting for OCR to complete.
  
  Even with this non-blocking usage pattern and very low OCR frequency, native memory continues to grow after worker.recognize() calls and is not released during long idle periods.

This makes it difficult to safely use tesseract.js in continuous or long-running server environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory (RSS) keeps growing when reusing a single worker with intermittent recognize() calls #1045

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Memory (RSS) keeps growing when reusing a single worker with intermittent recognize() calls #1045

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions