n8n-nodes-pdf-utils

Custom n8n node for PDF inspection and splitting using pure npm packages.

Features

🔍 Inspect Operation

Analyzes PDF structure
Counts pages
Detects if PDF is vectorial (text-based) or rasterized (image-based)
Extracts text from first page
Performance: Very fast (tens of milliseconds)

✂️ Split Operation

Splits multi-page PDFs into individual pages
Creates one output item per page
Preserves PDF quality and structure

Installation

Option 1: Install from npm (when published)

npm install n8n-nodes-pdf-utils

Option 2: Install locally for development

Clone this repository
Install dependencies:
```
npm install
```
Build the node:
```
npm run build
```

Link to your n8n installation:

npm link
cd ~/.n8n/nodes
npm link n8n-nodes-pdf-utils

Restart n8n

Option 3: Install in n8n using community nodes

Go to Settings > Community Nodes
Click Install
Enter: n8n-nodes-pdf-utils
Click Install

Usage

Inspect Operation

Input: Binary data containing a PDF file

Parameters:

Binary Property: Name of the binary property (default: "data")
Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)

Output: Single item with analysis + original PDF binary

{
  "json": {
    "pageCount": 5,
    "isMultiPage": true,
    "isVectorial": false,
    "textLength": 23,
    "firstPageText": "Preview of first 200 characters..."
  },
  "binary": {
    "data": "<original PDF>"
  }
}

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Inspect)
    → IF (isVectorial)
      → Route A (text processing with PDF)
      → Route B (OCR processing with PDF)

Inspect and Split Operation

Input: Binary data containing a PDF file

Parameters:

Binary Property: Name of the binary property (default: "data")
Text Threshold: Minimum text length to consider PDF as vectorial (default: 50)
Output Binary Property: Name for output binary property (default: "data")

Output:

If vectorial: Single item with analysis + original PDF (pass-through)
If not vectorial: Multiple items, one per page (split)

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Inspect and Split)
    → Vectorial PDFs pass through as-is
    → Scanned PDFs split into pages automatically

Use case: Automatically handle different PDF types without manual branching:

Text-based PDFs (vectorial) → process as whole document
Scanned PDFs (non-vectorial) → OCR each page individually

Split Operation

Input: Binary data containing a multi-page PDF

Parameters:

Binary Property: Name of the input binary property (default: "data")
Output Binary Property: Name for output binary property (default: "data")

Output: Multiple items, one per page

Each item contains binary data with a single-page PDF
JSON includes pageNumber and originalFileName

Example workflow:

HTTP Request (download PDF)
  → PDF Utils (Split)
    → Loop Over Items
      → Process each page individually

Technical Details

Dependencies

pdfjs-dist (v5.4.394): For PDF analysis and text extraction (uses legacy build for Node.js)
pdf-lib (v1.17.1): For PDF manipulation and splitting

Why These Libraries?

pdfjs-dist: Mozilla's PDF.js library - battle-tested, used in Firefox (headless mode, no canvas needed). We use the legacy build (pdfjs-dist/legacy/build/pdf.mjs) which is specifically designed for Node.js environments without DOM dependencies.
pdf-lib: Pure JavaScript, no native dependencies, excellent for manipulation
100% npm packages: No system-level dependencies (like Poppler, Ghostscript) and no canvas/native modules!

Performance

Inspect: Very fast (~10-50ms for typical PDFs)
Split: Fast, scales linearly with page count (~50-200ms per page)

Development

# Install dependencies
npm install

# Build
npm run build

# Watch mode for development
npm run dev

# Lint
npm run lint

# Format code
npm run format

Troubleshooting

n8n doesn't detect the node

Ensure n8n is restarted after installation
Check that the node is in ~/.n8n/nodes or installed globally
Verify package.json has correct n8n.nodes configuration

"pdfjs-dist" errors

If you encounter issues with pdfjs-dist, ensure you're using Node.js 16 or higher:

node --version  # Should be v16.0.0 or higher

License

MIT

Author

Roberto Michelena - INFINITEK S.A.C.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
credentials		credentials
examples		examples
nodes/PdfUtils		nodes/PdfUtils
.eslintrc.js		.eslintrc.js
.eslintrc.prepublish.js		.eslintrc.prepublish.js
.gitignore		.gitignore
.prettierrc.js		.prettierrc.js
FILE_LIST.txt		FILE_LIST.txt
INDEX.md		INDEX.md
INSTALLATION.md		INSTALLATION.md
OVERVIEW.txt		OVERVIEW.txt
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SUMMARY.md		SUMMARY.md
TECHNICAL_NOTES.md		TECHNICAL_NOTES.md
gulpfile.js		gulpfile.js
index.ts		index.ts
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
verify.js		verify.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

n8n-nodes-pdf-utils

Features

🔍 Inspect Operation

✂️ Split Operation

Installation

Option 1: Install from npm (when published)

Option 2: Install locally for development

Option 3: Install in n8n using community nodes

Usage

Inspect Operation

Inspect and Split Operation

Split Operation

Technical Details

Dependencies

Why These Libraries?

Performance

Development

Troubleshooting

n8n doesn't detect the node

"pdfjs-dist" errors

License

Author

Contributing

Roadmap

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

rmichelena/n8n-nodes-pdf-utils

Folders and files

Latest commit

History

Repository files navigation

n8n-nodes-pdf-utils

Features

🔍 Inspect Operation

✂️ Split Operation

Installation

Option 1: Install from npm (when published)

Option 2: Install locally for development

Option 3: Install in n8n using community nodes

Usage

Inspect Operation

Inspect and Split Operation

Split Operation

Technical Details

Dependencies

Why These Libraries?

Performance

Development

Troubleshooting

n8n doesn't detect the node

"pdfjs-dist" errors

License

Author

Contributing

Roadmap

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages