Consumer to ignore certain filename patterns #1037

skorvek · 2021-05-17T16:33:07Z

skorvek
May 17, 2021

I'm new to this application and I'm in love with it already. Thank you for this- excited to retire 5 file cabinets worth of "stuff" and the many many electronic file locations scattered across the network.

I have one problem I haven't been able to address however. My daily driver workstation is an iMac which has the scanner attached to it. Rather than replacing the USB scanner with a networked one (may be an eventual strategy) and also thinking of the use case I have where documents physically arrive at multiple physical locations (home and 2 commercial locations) I have set up a SAMBA share that I can drop old files into and point the scanner at as a save location.

Problem is the annoying hidden files that the mac adds to keep track of its own metadata and ACLs. The .DS_STORE file created in every directory and the ._XXXXX file created for each and every file. I am aware of the setting to tell finder to stop doing that, but of course, that flag hasn't actually worked in several versions. There are 3rd party tools to delete the file, but I can't get it to not create it.

So paperless sees the filesystem change and creates a job for it. I delete it (or don't) and I have a failed job. I've got literally thousands of failed jobs to clean up and a cluttered log. Haven't looked at the database yet to see what mess is in there. Worse, I can't rely on the failed jobs to detect a real problem such as a mangled file because there's so much junk to wade through.

I assume this is something I'd be able to fix with a well-written pre-procesor script, but I have no idea where to start with that nor am I sure that it runs at the right time to prevent the job from being created. I can administer linux systems well but I'm not a developer- bash scripts aren't my thing.

jonaswinkler · 2021-05-17T16:48:42Z

jonaswinkler
May 17, 2021
Maintainer

Since this affects all mac users (I'd assume quite a few), I'd be okay with hard-coding exceptions for that into the directory monitoring component. I've already got some exceptions in place there for various scanner quirks.

Please provide detailed examples of what files and patterns to exclude, so that I can build fitting regular expressions.

.DS_STORE - ignore files named exactly like this.
._XXXXX - is this the actual file name? How do these actually look like?

I vaguely remember that mac OS also creates various hidden folders... Is this correct?

0 replies

skorvek · 2021-05-18T12:03:32Z

skorvek
May 18, 2021
Author

So what I'm seeing (despite the explicit directive to not create these files on network drives) is for most every file created to have a small shadow file created in the same directory with the exact same filename/case as the actual file with "._" prepended.

Example: I add a file called Document1.pdf to the drive, in the same folder I will have a hidden file called ._Document1.pdf as well.

In addition to this, every folder has a "._.DS_Store" file placed in it. I haven't seen any hidden folders on network drives. That sort of stuff is contained to system folders (Library, iOS caches, etc) as far as I know.

Thanks for considering this. I'd suggest you could even go so far as to just delete the files in addition to ignoring them from the monitoring. Not sure if there's any other implications there, but I don't think so. MacOS provides a .dot_clean CLI command but when I tried it everything just ground to a halt (I had dumped a few thousand things into the consume folder). I ended up deleting them manually and nothing bad happened and MacOS didn't try to recreate them. Only effect was the failed job in the paperless-ng job queue.

I will note that opening any file from the network drive (which I wouldn't likely do since its entire purpose is to serve as a consume folder) will recreate the metadata file. Not sure the purpose of that since if it contained any metadata, I had just deleted it. Guess it's writing out new default info.

3 replies

jonaswinkler May 18, 2021
Maintainer

I'll ignore all files that start with "._" then. Seems reasonably safe.

Regarding deletion: I'm not 100% sure what these files are used for, so I'd rather not do that.

jonaswinkler May 18, 2021
Maintainer

Actually, regarding deletion: When deleting "document.pdf" after successful consumption, I'll check for "._document.pdf" and delete that as well. Sounds good?

jonaswinkler May 20, 2021
Maintainer

Added in 1.4.4!

skorvek · 2021-06-02T11:38:00Z

skorvek
Jun 2, 2021
Author

Perfect thanks! I’ll check it out soon.

0 replies

skorvek · 2021-06-04T16:01:26Z

skorvek
Jun 4, 2021
Author

Works! The shadow files disappear from the share after import and there aren't failed jobs for them. Thanks!

However I'm getting failed jobs during the copy in- seems paperless is trying to grab the file for import before they're fully written to disk. Maybe polling is still required as well. Looking back they were there as well.

0 replies

AlD · 2021-08-08T20:10:56Z

AlD
Aug 8, 2021

I'm using Syncthing to upload documents from my phone, laptop etc. to the consumer. Therefore I'd like to ignore the Syncthing metadata directory .stfolder.

How about making this configurable? #1221

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consumer to ignore certain filename patterns #1037

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Consumer to ignore certain filename patterns #1037

Uh oh!

skorvek May 17, 2021

Replies: 5 comments · 3 replies

Uh oh!

Uh oh!

jonaswinkler May 17, 2021 Maintainer

Uh oh!

skorvek May 18, 2021 Author

Uh oh!

jonaswinkler May 18, 2021 Maintainer

Uh oh!

jonaswinkler May 18, 2021 Maintainer

Uh oh!

jonaswinkler May 20, 2021 Maintainer

Uh oh!

skorvek Jun 2, 2021 Author

Uh oh!

skorvek Jun 4, 2021 Author

Uh oh!

AlD Aug 8, 2021

skorvek
May 17, 2021

Replies: 5 comments 3 replies

jonaswinkler
May 17, 2021
Maintainer

skorvek
May 18, 2021
Author

jonaswinkler May 18, 2021
Maintainer

jonaswinkler May 18, 2021
Maintainer

jonaswinkler May 20, 2021
Maintainer

skorvek
Jun 2, 2021
Author

skorvek
Jun 4, 2021
Author

AlD
Aug 8, 2021