Split single PDF into multiple ones using a delimiter page #459
Replies: 5 comments 4 replies
-
You scan the same qr code page in between documents, right? Sounds useful to me. However, its pretty hard to get that functionality into paperless. Let me elaborate on how the consumption pipeline works real quick:
The issue is as follows: The only place where we're sure we're dealing with PDF documents (and not text files / office documents) is inside the PDF parser. However, at that place, we're limited to producing exactly one document. Changing that requires many changes to how the consumption pipeline works, invalidates many test cases, etc. The key file is Adding that to the consumption folder watcher ( I've got a better idea:
The key here is that both containers would "communicate" through that internal consumption folder. I'm not doing that though, but I can give some directions and hints if someone wants to take a stab at it. |
Beta Was this translation helpful? Give feedback.
-
What a coincidence. I just made some code to do just this today. However, my delimiter is just a blank page. So far it works fine, but my code needs some improvement. I don't want to make this a ongoing project for myself so once I've made it usable for myself I'll upload and link the project for anyone to fork and improve upon. |
Beta Was this translation helpful? Give feedback.
-
Why not have different QR codes instruct Paperless-ng on what to do on the next pages? In case of options not understood the default may apply. We would need a web page where to download/print the QR codes understood by that PL-ng version. |
Beta Was this translation helpful? Give feedback.
-
May I hook into this discussion. I'm not a developer but would it may be an Idea to do this: |
Beta Was this translation helpful? Give feedback.
-
I might also like to add to the discussions, that adding something to pages (stickers, etc.) most likely will trigger the fault detection of ADF scanners. Thus, the only way I saw was to print out sheets with a QR code.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I was in the course of implementing a tool for archiving and auto-uploading my documents before I stumbled across paperless, which is much more powerful and mature than my little pet project.
However, I can feed a stack of paper consisting of multiple documents (up to my device's 50 page limit) to my scanner which gets uploaded to the inbox folder of my server as a single file (of course). My server application then searches through the pages of the file for a certain page containing a QR code. This position is used to cut the large files into separate documents again.
Example: (numbers are page numbers)
Will be cut into:
Support for this by paperless would be cool! My code is here:
https://github.com/denvercoder21/split-pdf/blob/main/split.py
I haven't taken the time yet to look through paperless' code, so I can't tell yet whether I'm confident creating a PR myself.
Let me know what you think!
Beta Was this translation helpful? Give feedback.
All reactions