Post-Hook - How to set ASN #460

andbez · 2021-01-03T01:41:59Z

andbez
Jan 3, 2021

Every scanned Document which is archived as paper get's a Number (Always 6-digits, the first 2 Digits are a long time 0 ;-)). This would be in the File-Content after Consumption and OCR by Paperless. Now i would like to extract the number by RegEx 00\d{4} and setting the ASN of the consumed document. I did not find any solution using the manage.py Script.
Any Idea how to handle this? Is it possible to use a self developed SQL-Statement?

Thx.

jonaswinkler · 2021-01-03T20:07:51Z

jonaswinkler
Jan 3, 2021
Maintainer

You could set up a post-consume script: https://paperless-ng.readthedocs.io/en/latest/advanced_usage.html#post-consumption-script

This receives the ID of the newly created document. That's all you need to connect to your database, get the document's content, do some regex matching on the content, and update the archive_serial_number field.

However, you'd have to wait for the next version for that. Currenctly, the script is called during a database transaction, not after, so the document won't be there yet by the time it is called. (Has been that for years in paperless, wonder why noone noticed)

0 replies

andbez · 2021-01-03T21:58:17Z

andbez
Jan 3, 2021
Author

Thank you (again ;-). The link you posted links to what i mean with post-hook. But i have to handle the DB-Connection by myself and there is no support by paperless-ng? i also have basic knowledge of python (i prefer to use python for this kind of solution), but there is no paperless-django-module which i can use for this? I only just need to send one sql-statement. Any hint is appreciated.

0 replies

jonaswinkler · 2021-01-03T23:12:27Z

jonaswinkler
Jan 3, 2021
Maintainer

There's no support in paperless for this. If you want to modify the source directly, I can give you some directions where to make these changes.

0 replies

andbez · 2021-01-11T15:55:26Z

andbez
Jan 11, 2021
Author

Can you point me where to find Database-Settings (URL, Database, User, Password) in the Docker-Container? When i like to connect to the database i think it is better to use this information instead of hardcoding.

0 replies

jonaswinkler · 2021-01-11T16:11:05Z

jonaswinkler
Jan 11, 2021
Maintainer

These settings should be available as environment variables, as specified in the docker-compose.env file / in the environment section of the docker-compose.yml file.

0 replies

andbez · 2021-01-11T16:14:28Z

andbez
Jan 11, 2021
Author

That's what i hoped for but unfortunately not (only PAPERLESS_DBHOST):
root@3b15426806a8:/usr/src/paperless/src/paperless# export
declare -x HOME="/root"
declare -x HOSTNAME="3b15426806a8"
declare -x LANG="C.UTF-8"
declare -x OLDPWD="/usr/src/paperless/src/paperless/static"
declare -x PAPERLESS_DBHOST="db"
declare -x PAPERLESS_OCR_LANGUAGE="deu"
declare -x PAPERLESS_REDIS="redis://192.168.4.20:6379"
declare -x PAPERLESS_TIKA_ENABLED="1"
declare -x PAPERLESS_TIKA_ENDPOINT="http://192.168.4.24:9998"
declare -x PAPERLESS_TIKA_GOTENBERG_ENDPOINT="http://192.168.4.25:3000"
declare -x PAPERLESS_TIME_ZONE="Europe/Berlin"
declare -x PATH="/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
declare -x PWD="/usr/src/paperless/src/paperless"
declare -x PYTHON_GET_PIP_SHA256="6e0bb0a2c2533361d7f297ed547237caf1b7507f197835974c0dd7eba998c53c"
declare -x PYTHON_GET_PIP_URL="https://github.com/pypa/get-pip/raw/fa7dc83944936bf09a0e4cb5d5ec852c0d256599/get-pip.py"
declare -x PYTHON_PIP_VERSION="20.2.4"
declare -x PYTHON_VERSION="3.7.9"
declare -x SHLVL="1"
declare -x TERM="xterm"
declare -x USERMAP_GID="1001"
declare -x USERMAP_UID="1007".

0 replies

jonaswinkler · 2021-01-11T16:18:31Z

jonaswinkler
Jan 11, 2021
Maintainer

Oh, right.

PAPERLESS_DBHOST is set, and is the hostname of the database server.

Database name, username and password all default to paperless if not specified otherwise as per

https://github.com/jonaswinkler/paperless-ng/blob/master/src/paperless/settings.py#L253

0 replies

andbez · 2021-01-11T16:21:08Z

andbez
Jan 11, 2021
Author

So, when the environment variable is not set, it is the default, otherwise i will find a given environment variable.

0 replies

jonaswinkler · 2021-01-11T16:24:34Z

jonaswinkler
Jan 11, 2021
Maintainer

Yes.

0 replies

andbez · 2021-01-12T14:11:34Z

andbez
Jan 12, 2021
Author

I am finished with the post-hook-script. Unfortunately the script is not called or there are permission problems.
Setup:
Post-Hook-Script in Subdir from consume -> /usr/src/paperless/consume/post-hook

I can run the script within the docker-container with root- and paperless-user -> Works
I tried to call by the Env-Variable PAPERLESS_POST_CONSUME_SCRIPT pyhton3 /usr/src/paperless/consume/post-hook -> No Call (no information about the call of the script in the Log-Files)
(In Docker-Env -> PAPERLESS_POST_CONSUME_SCRIPT=/usr/local/bin/python3 /usr/src/paperless/consume/post-hook/asn.py)
i put the call in a bash-script, set chmod a+x and to the paperless-user in docker. Exec the bash in Docker stops with

paperless@bc121151aab5:~/consume/post-hook$ ls -la
total 
drwxr-sr-x 2 paperless paperless  4096 Jan 12 13:57 .
drwxrwsrwx 3 paperless paperless 24576 Jan 12 13:54 ..
-rw-r--r-- 1 paperless paperless  3838 Jan 12 13:53 asn.py
-rwxr-xr-x 1 paperless paperless   326 Jan 12 13:57 asn.sh
paperless@bc121151aab5:~/consume/post-hook$ ./asn.sh
bash: ./asn.sh: Permission denied
paperless@bc121151aab5:~/consume/post-hook$ python3 asn.py 2423 2 3 4 5 6 7 8
ASN nach Content : -1 
-1 
Connection closed!

Questions:

Is there a Log-Entry, when the Post-Consume-Script is executed?
What about permissions?
Can i use the first ENV-Call (python3 asn.py) and the Arguments are supplied by paperless?
Can the script expect to get always 8 Arguments?

0 replies

andbez · 2021-01-12T16:07:15Z

andbez
Jan 12, 2021
Author

Ok. Worked it out. The Filesystem, where the script resides was mounted with noexec! This is the reason, why the bash-script gets "Permission denied". I bind a new folder to the docker-container, which is mounted "exec". Now the script is executed. Information regarding this issue is found only in the docker-logs. Now there are only remain my questions 3 and 4.

0 replies

jonaswinkler · 2021-01-12T16:19:53Z

jonaswinkler
Jan 12, 2021
Maintainer

No, the post consume script is expected to be a file. However, you can

chmod +x the python script
add #!/usr/bin/python3 to the first line
provide an absolute or relative (starting with ./) path to the python file to paperless.

Yes.

You can also bind mount single files into the container if you do not wish to put that into the consumption directory.

0 replies

andbez · 2021-01-12T21:28:02Z

andbez
Jan 12, 2021
Author

Finished. Everythink works. My first post-hook-script. It tries to extract a 6-digit number from the title and content of the document. If it finds a matching number and it is not used already by any other document it writes it to the ASN-Field in the database (currently only postgresql is supported). If somebody is interested, i can provide the file. May be it can be found a repository to provide such hook-scripts. Thx for your support.

1 reply

e-patrick May 28, 2021

Hi andbez,
Looks like you have already achieved what I was still looking for 👍
Would you be willing to share your hook script?
In the first place, I had the idea to print a couple sheets of small labels, each with QR code containing an unique ASN, so that I just could stick one of those labels to each document, but then I would need to identify and interpret this QR code in the scanned document, which would be a lot more complicated than your approach.
Now, I'll probably just get a numbering apparatus to easily stamp the number to the documents and rely on OCR to find it.
Regards,
Patrick

andbez · 2021-05-28T20:39:10Z

andbez
May 28, 2021
Author

asn.py.zip
Hi e-patrick,

sure, no problem. Please adjust Database, Databaseuser and Databasepassword for your environment. The script still works for me, just a few false positives when OCR is not working 100%. But that happens rarely.

Best,
andbez

1 reply

e-patrick May 28, 2021

Thanks a lot! I pretty sure it is going to work for me as well!
THX,
Patrick

jhqv · 2021-09-11T15:15:31Z

jhqv
Sep 11, 2021

Hi, has anyone in this thread (@e-patrick, @andbez) experienced issues with Paperless not being able to run the post consume script in a Docker container?

I've been struggling for a long time with a simple script that calls Home Assistant (using curl) after consumption, but keep getting the message "Configured post-consume script "/usr/src/paperless/post_consume.sh" does not exist.", which is odd since I can both ls, cat and execute that particular script inside the container. The script is mounted using a bind mount and it is passed to Paperless using an environment variable with an absolute path.

Paperless log:
[2021-09-11 12:09:35,656] [ERROR] [paperless.consumer] Configured post-consume script "/usr/src/paperless/post_consume.sh" does not exist.

Docker log:

[2021-09-11 12:09:35,656] [ERROR] [paperless.consumer] Configured post-consume script "/usr/src/paperless/post_consume.sh" does not exist.
12:09:35 [Q] INFO Process-1:2 stopped doing work
12:09:35 [Q] ERROR Failed [2021-09-11.jpg] - 2021-09-11.jpg: Configured post-consume script "/usr/src/paperless/post_consume.sh" does not exist. : Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/django_q/cluster.py", line 432, in worker
    res = f(*task["args"], **task["kwargs"])
  File "/usr/src/paperless/src/documents/tasks.py", line 74, in consume_file
    document = Consumer().try_consume_file(
  File "/usr/src/paperless/src/documents/consumer.py", line 355, in try_consume_file
    self.run_post_consume_script(document)
  File "/usr/src/paperless/src/documents/consumer.py", line 135, in run_post_consume_script
    self._fail(
  File "/usr/src/paperless/src/documents/consumer.py", line 70, in _fail
    raise ConsumerError(f"{self.filename}: {log_message or message}")
documents.consumer.ConsumerError: 2021-09-11.jpg: Configured post-consume script "/usr/src/paperless/post_consume.sh" does not exist.

Test using bash inside container:

$ docker exec -it paperless-ng bash
root@2d9e4a07e122:/usr/src/paperless/src# cd ..
root@2d9e4a07e122:/usr/src/paperless# ./post_consume.sh
{}root@2d9e4a07e122:/usr/src/paperless# ls -l post_consume.sh
-rwxrwxr-x+ 1 paperless paperless 295 Aug 22 10:43 post_consume.sh

Can anyone understand why the script can't be executed by Paperless?

2 replies

e-patrick Sep 12, 2021

Hi @jhqv,
I didn't face this issue, so I can't tell what's wrong, but maybe there's a hint in a message further up the thread (#460 (comment))? Maybe it's related to a missing or wrong hashbang?
If anything else comes to my mind in the next couple of days I'll get back with it...

andbez Sep 13, 2021
Author

Hi @jhqv,

I did not face the issue as well. But my setup is different from yours. I bind mount a separate folder

/opt/containerdata/paperless-ng/post-hook /post-hook

to the container and put the script into this folder. (please be aware of the "exec"-issue in my comments before). Set the path to the correct location in your paperless-setup.

PAPERLESS_POST_CONSUME_SCRIPT: /post-hook/asn.py

jhqv · 2021-09-13T17:07:50Z

jhqv
Sep 13, 2021

@e-patrick and @andbez - thanks for helping out! I solved the problem - it wasn't the hashbang, permissions or noexec... It turned out that the quotes in the value of the ENV variable was the culprit.

This didn't work:
PAPERLESS_POST_CONSUME_SCRIPT="/post_hook/post_consume.sh"

But this worked:
PAPERLESS_POST_CONSUME_SCRIPT=/post_hook/post_consume.sh

(Fun fact: I realized this after testing PAPERLESS_POST_CONSUME_SCRIPT="/bin/ls" which didn't work either.)

The pre consumption hook example in the docs uses quotes in the environment, and I assumed the .env file was supposed to have the same syntax. I'll try to submit a change to the docs to make this part clearer.

@jonaswinkler Is there any particular reason to why document type isn't available as an argument to the post-consumption script?

0 replies

Post-Hook - How to set ASN #460

Uh oh!

Uh oh!

andbez Jan 3, 2021

Replies: 16 comments · 4 replies

Uh oh!

jonaswinkler Jan 3, 2021 Maintainer

Uh oh!

andbez Jan 3, 2021 Author

Uh oh!

jonaswinkler Jan 3, 2021 Maintainer

Uh oh!

andbez Jan 11, 2021 Author

Uh oh!

jonaswinkler Jan 11, 2021 Maintainer

Uh oh!

Uh oh!

andbez Jan 11, 2021 Author

Uh oh!

jonaswinkler Jan 11, 2021 Maintainer

Uh oh!

andbez Jan 11, 2021 Author

Uh oh!

jonaswinkler Jan 11, 2021 Maintainer

Uh oh!

Uh oh!

andbez Jan 12, 2021 Author

Uh oh!

andbez Jan 12, 2021 Author

Uh oh!

Uh oh!

jonaswinkler Jan 12, 2021 Maintainer

Uh oh!

Uh oh!

andbez Jan 12, 2021 Author

Uh oh!

e-patrick May 28, 2021

Uh oh!

andbez May 28, 2021 Author

Uh oh!

e-patrick May 28, 2021

Uh oh!

jhqv Sep 11, 2021

Uh oh!

e-patrick Sep 12, 2021

Uh oh!

Uh oh!

andbez Sep 13, 2021 Author

Uh oh!

jhqv Sep 13, 2021

andbez
Jan 3, 2021

Replies: 16 comments 4 replies

jonaswinkler
Jan 3, 2021
Maintainer

andbez
Jan 3, 2021
Author

jonaswinkler
Jan 3, 2021
Maintainer

andbez
Jan 11, 2021
Author

jonaswinkler
Jan 11, 2021
Maintainer

andbez
Jan 11, 2021
Author

jonaswinkler
Jan 11, 2021
Maintainer

andbez
Jan 11, 2021
Author

jonaswinkler
Jan 11, 2021
Maintainer

andbez
Jan 12, 2021
Author

andbez
Jan 12, 2021
Author

jonaswinkler
Jan 12, 2021
Maintainer

andbez
Jan 12, 2021
Author

andbez
May 28, 2021
Author

jhqv
Sep 11, 2021

andbez Sep 13, 2021
Author

jhqv
Sep 13, 2021