|
1 | 1 | # simple-pdf
|
2 | 2 |
|
3 | 3 | [](https://www.npmjs.com/package/simple-pdf)
|
4 |
| -[!](https://github.com/scriptcoded/simple-pdf/actions?query=workflow%3ATests+branch%3Amaster) |
| 4 | +[](https://github.com/scriptcoded/simple-pdf/actions?query=workflow%3ATests+branch%3Amaster) |
5 | 5 | [](https://david-dm.org/scriptcoded/simple-pdf)
|
6 | 6 |
|
7 | 7 | `simple-pdf` aims to be a simple drop-in module for extracting text and images
|
@@ -30,24 +30,24 @@ from PDF files. It exposes a promise-based and an event-based API.
|
30 | 30 | Let's be real. This might not be the library for you. Here are a few reasons why.
|
31 | 31 |
|
32 | 32 | - **Slow with images** - Images can be embedded in a PDF in many different ways. To ensure that all types of images can be extracted we render the whole PDF and then use [sharp](https://github.com/lovell/sharp) to extract the images from the rendered page. This adds extra processing time for pages that contains images (provided that you don't disable image extraction).
|
33 |
| -- **New to the game** - This library is brand new and haven't been battle tested yet. If you're looking for a reliable solution, this library might not be the best choice for you. |
| 33 | +- **New to the game** - This library is brand new and hasn't been battle tested yet. If you're looking for a reliable solution, this library might not be the best choice for you. |
34 | 34 | - **No automated testing** - Though I'm working on this 🙃
|
35 | 35 |
|
36 | 36 | ## Examples
|
37 | 37 |
|
38 | 38 | **Minimal example:**
|
39 | 39 |
|
40 | 40 | ```javascript
|
41 |
| -const fs = require('fs'); |
42 |
| -const { SimplePDFParser } = require('simple-pdf'); |
| 41 | +const fs = require('fs') |
| 42 | +const { SimplePDFParser } = require('simple-pdf') |
43 | 43 |
|
44 |
| -const fileBuffer = fs.readFileSync('somefile.pdf'); |
| 44 | +const fileBuffer = fs.readFileSync('somefile.pdf') |
45 | 45 |
|
46 |
| -const parser = new SimplePDFParser(fileBuffer); |
| 46 | +const parser = new SimplePDFParser(fileBuffer) |
47 | 47 |
|
48 | 48 | parser.parse().then((result) => {
|
49 | 49 | console.log(result)
|
50 |
| -}); |
| 50 | +}) |
51 | 51 | ```
|
52 | 52 |
|
53 | 53 | More examples can be found in the `examples` directory and can be run with the following commands:
|
@@ -126,15 +126,15 @@ const parser = new SimplePDFParser(fileBuffer)
|
126 | 126 |
|
127 | 127 | // Called with each page
|
128 | 128 | parser.on('page', (page) => {
|
129 |
| - console.log(`Page ${page.index}:`); |
130 |
| - console.log('Text elements: ', page.textElements); |
131 |
| - console.log('Image elements:', page.imageElements); |
132 |
| -}); |
| 129 | + console.log(`Page ${page.index}:`) |
| 130 | + console.log('Text elements: ', page.textElements) |
| 131 | + console.log('Image elements:', page.imageElements) |
| 132 | +}) |
133 | 133 |
|
134 | 134 | // Called when the parsing is finished
|
135 | 135 | parser.on('done', () => {
|
136 |
| - console.log('Parser done'); |
137 |
| -}); |
| 136 | + console.log('Parser done') |
| 137 | +}) |
138 | 138 |
|
139 | 139 | // This must be run even if you just use the events API, but then you may ignore the return value
|
140 | 140 | const result = await parser.parseRaw()
|
@@ -163,6 +163,19 @@ const result = await parser.parseRaw()
|
163 | 163 | }
|
164 | 164 | ```
|
165 | 165 |
|
| 166 | +## Roadmap |
| 167 | + |
| 168 | +More of a todo, but let's call it a roadmap |
| 169 | + |
| 170 | +- [ ] Tests |
| 171 | + - [ ] Better coverage |
| 172 | + - [ ] Windows - Something is wrong either with the library or the tests (https://github.com/scriptcoded/simple-pdf/runs/1048499489) |
| 173 | +- [ ] Make a logo (everyone likes a logo) |
| 174 | +- [ ] Rewrite codebase in TypeScript |
| 175 | +- [ ] Improve image extraction |
| 176 | +- [ ] Set up automatic CI/CD pipeline for NPM deployment |
| 177 | +- [ ] Simplify the API |
| 178 | + |
166 | 179 | ## Tests
|
167 | 180 |
|
168 | 181 | Tests can be run with with the following commands:
|
|
0 commit comments