Skip to content

Commit 4fdcd91

Browse files
committed
Add browser-specific entry and update worker config
Introduces src/index.browser.ts as a browser-specific entry point, sets up the correct PDF.js worker for browser builds, and updates Vite configs to use this entry. Documentation is improved for browser usage and worker configuration, and TypeScript configs now exclude the new browser entry from Node/TS builds. Also switches to the minified PDF.js worker in the worker index.
1 parent e4db3f6 commit 4fdcd91

File tree

9 files changed

+78
-31
lines changed

9 files changed

+78
-31
lines changed

README.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,8 @@ export default nextConfig;
268268
Custom builds, Electron/NW.js, or specific deployment environments—you may need to manually configure the worker source.
269269
270270
```js
271-
import {getWorkerPath, getWorkerSource} from "pdf-parse/worker"; // Import this before importing "pdf-parse"
271+
// Import this before importing "pdf-parse"
272+
import {getWorkerPath, getWorkerSource} from "pdf-parse/worker";
272273
import {PDFParse} from "pdf-parse";
273274
274275
// CommonJS
@@ -306,7 +307,10 @@ try {
306307
```html
307308
<!-- ES Module -->
308309
<script type="module">
309-
import { PDFParse } from 'https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf-parse.es.min.js';
310+
import {PDFParse} from 'https://cdn.jsdelivr.net/npm/pdf-parse@latest/+esm';
311+
const parser = new PDFParse({url:'https://mehmet-kozan.github.io/pdf-parse/pdf/bitcoin.pdf'});
312+
const result = await parser.getText()
313+
console.log(result.text)
310314
</script>
311315
```
312316
@@ -325,21 +329,7 @@ try {
325329
326330
327331
328-
### Worker Configuration
329332
330-
In browser environments, `pdf-parse` requires a separate worker file to process PDFs in a background thread. By default, `pdf-parse` automatically loads the worker from the jsDelivr CDN. However, you can configure a custom worker source if needed.
331-
332-
**When to Configure Worker Source:**
333-
- Using a custom build of `pdf-parse`
334-
- Self-hosting worker files for security or offline requirements
335-
- Using a different CDN provider
336-
337-
**Available Worker Files:**
338-
339-
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.mjs`
340-
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.min.mjs`
341-
342-
See [`example/basic.esm.worker.html`](example/basic.esm.worker.html) for a working example of browser usage with worker configuration.
343333
344334
## Similar Packages
345335
* [pdf2json](https://www.npmjs.com/package/pdf2json) — Buggy, memory leaks, uncatchable errors in some PDF files.
@@ -365,7 +355,8 @@ Requires additional setup — import and configure a compatible CanvasFactory or
365355
366356
ESM
367357
```js
368-
import { CustomCanvasFactory } from 'pdf-parse/canvas'; // Import this before importing "pdf-parse"
358+
// Import this before importing "pdf-parse"
359+
import { CustomCanvasFactory } from 'pdf-parse/canvas';
369360
import { PDFParse } from 'pdf-parse';
370361
371362
const parser = new PDFParse({ data: buffer, CanvasFactory: CustomCanvasFactory });
@@ -374,7 +365,8 @@ const parser = new PDFParse({ data: buffer, CanvasFactory: CustomCanvasFactory }
374365
375366
CommonJS
376367
```js
377-
const { CustomCanvasFactory } = require('pdf-parse/canvas'); // Import this before importing "pdf-parse"
368+
// Import this before importing "pdf-parse"
369+
const { CustomCanvasFactory } = require('pdf-parse/canvas');
378370
const { PDFParse } = require('pdf-parse');
379371
380372
const parser = new PDFParse({ data: buffer, CanvasFactory: CustomCanvasFactory });

README.worker.md

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,4 +56,22 @@ PDFParse.setWorker(workerUrl);
5656
- Electron/NW.js packaging with non-standard import paths
5757
- Self-hosting worker files for offline or security requirements
5858
59-
If you don't need to set a custom worker, you can ignore this file — `pdf-parse` will pick a sensible default.
59+
60+
61+
62+
63+
### Browser Worker Configuration
64+
65+
In browser environments, `pdf-parse` requires a separate worker file to process PDFs in a background thread. By default, `pdf-parse` automatically loads the worker from the jsDelivr CDN. However, you can configure a custom worker source if needed.
66+
67+
**When to Configure Worker Source:**
68+
- Using a custom build of `pdf-parse`
69+
- Self-hosting worker files for security or offline requirements
70+
- Using a different CDN provider
71+
72+
**Available Worker Files:**
73+
74+
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.mjs`
75+
- `https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.min.mjs`
76+
77+
See [`example/basic.esm.worker.html`](example/basic.esm.worker.html) for a working example of browser usage with worker configuration.

bin/worker/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/// <reference types="vite/client" />
2-
import * as WorkerUrl from 'pdfjs-dist/legacy/build/pdf.worker.mjs?url';
2+
import * as WorkerUrl from 'pdfjs-dist/legacy/build/pdf.worker.min.mjs?url';
33

44
export function getWorkerSource() {
55
return WorkerUrl.default;

src/PDFParse.ts

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -74,11 +74,11 @@ export class PDFParse {
7474
return pdfjs.GlobalWorkerOptions.workerSrc;
7575
}
7676

77-
if (!PDFParse.isNodeJS) {
78-
pdfjs.GlobalWorkerOptions.workerSrc =
79-
'https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.min.mjs';
80-
return pdfjs.GlobalWorkerOptions.workerSrc;
81-
}
77+
// if (!PDFParse.isNodeJS) {
78+
// pdfjs.GlobalWorkerOptions.workerSrc =
79+
// 'https://cdn.jsdelivr.net/npm/pdf-parse@latest/dist/browser/pdf.worker.min.mjs';
80+
// return pdfjs.GlobalWorkerOptions.workerSrc;
81+
// }
8282

8383
return pdfjs.GlobalWorkerOptions.workerSrc;
8484
}
@@ -978,4 +978,4 @@ export class PDFParse {
978978
}
979979
}
980980

981-
PDFParse.setWorker();
981+
//PDFParse.setWorker();

src/index.browser.ts

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
/// <reference types="vite/client" />
2+
import * as WorkerUrl from 'pdfjs-dist/legacy/build/pdf.worker.min.mjs?url';
3+
4+
import { PDFParse } from './PDFParse.js';
5+
6+
PDFParse.setWorker(WorkerUrl.default);
7+
8+
export { VerbosityLevel } from 'pdfjs-dist/legacy/build/pdf.mjs';
9+
export type {
10+
DocumentInitParameters,
11+
PDFDataRangeTransport,
12+
PDFWorker,
13+
TypedArray,
14+
} from 'pdfjs-dist/types/src/display/api.js';
15+
16+
export { getHeader, type HeaderResult } from './HeaderResult.js';
17+
export type { EmbeddedImage, ImageKindKey, ImageKindValue, ImageResult, PageImages } from './ImageResult.js';
18+
export type { DateNode, InfoResult, Metadata, OutlineNode, PageLinkResult } from './InfoResult.js';
19+
export type { ParseParameters } from './ParseParameters.js';
20+
export type { Screenshot, ScreenshotResult } from './ScreenshotResult.js';
21+
export type { PageTableResult, TableResult } from './TableResult.js';
22+
export type { PageTextResult, TextResult } from './TextResult.js';
23+
24+
/**
25+
* The URL of the PDF.
26+
* -
27+
* Binary PDF data.
28+
* Use TypedArrays (Uint8Array) to improve the memory usage. If PDF data is
29+
* BASE64-encoded, use `atob()` to convert it to a binary string first.
30+
* https://mozilla.github.io/pdf.js/examples/
31+
*
32+
* NOTE: If TypedArrays are used they will generally be transferred to the
33+
* worker-thread. This will help reduce main-thread memory usage, however
34+
* it will take ownership of the TypedArrays.
35+
*/
36+
37+
export { PDFParse };

tsconfig.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,5 @@
1919
"verbatimModuleSyntax": true
2020
},
2121
"include": ["src/**/*"],
22-
"exclude": ["src/**/_*", "src/_**/*"]
22+
"exclude": ["src/**/_*", "src/_**/*", "src/index.browser.ts"]
2323
}

tsconfig.node.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,5 @@
1919
"verbatimModuleSyntax": false
2020
},
2121
"include": ["src/**/*"],
22-
"exclude": ["src/**/_*", "src/_**/*"]
22+
"exclude": ["src/**/_*", "src/_**/*", "src/index.browser.ts"]
2323
}

vite.config.browser.min.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ export default defineConfig({
77
outDir: 'dist/browser',
88
emptyOutDir: false,
99
sourcemap: false,
10-
minify: true,
10+
minify: 'terser',
1111
lib: {
12-
entry: 'src/index.ts',
12+
entry: 'src/index.browser.ts',
1313
name: 'PdfParse',
1414
fileName: (format) => `pdf-parse.${format}.min.js`,
1515
formats: ['es', 'umd'],

vite.config.browser.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ export default defineConfig({
77
sourcemap: true,
88
minify: false,
99
lib: {
10-
entry: 'src/index.ts',
10+
entry: 'src/index.browser.ts',
1111
name: 'PdfParse',
1212
fileName: (format) => `pdf-parse.${format}.js`,
1313
formats: ['es', 'umd'],

0 commit comments

Comments
 (0)