-
-
Notifications
You must be signed in to change notification settings - Fork 260
Description
π Bug Summary
The AI resets the create date to 1990-01-01, but not always
π Description
I have noticed that the AI resets "date created" even after explicitly asked not to touch the dates. This renders the app not usable for me as I cannot have the dates for my documents to be changed in any way.
Paperless-NGX v.2.20.6 is a LCX installation (in Proxmox) and Paperless-AI v3.0.9 is a docker installation on another LXC (Proxmox node)
Paperless NGX picks the correct date for Date Created even when there are many document on a document (eg. an invoice).
However, when the AI does its job of tagging, correspondent, title, document classification, it also resets the date.
I use Ollama, model llama3.1:8b and have tried PHI4:latest, same thing. I have in vain spent hours instructing and tweaking the AI setup to not touch the dates.
I have also tested the Example prompt to no avail.
The issue is that the AI sometimes(!) does not recognise the dates coming from Paperless NGX. I dont see what else I can do in paperless ngx other than setting the display date to ISO 8601 - 2026-02-11 (medium).
Why is Paperless-AI even trying to change the create date?
Suggestions?
Thanks
π Steps to Reproduce
- have paperless ngx and paperless ai running as per above.
- set ai instructions to (2 examples
2.1 example
`You are a personalized document analyzer. Your task is to analyze documents and extract relevant information.
Analyze the document content and extract the following information into a valid JSON object using exactly these keys:
- "title": (String) A concise, meaningful title. No addresses. Include invoice/order numbers if present. Use the document's language.
- "correspondent": (String) The shortest version of the sender/institution name (e.g., "Amazon", not "Amazon EU SARL").
- "tags": (Array of Strings) Select up to 4 thematic tags. Use existing categories if possible. IMPORTANT: Return this as a list, e.g., ["Tax", "Invoice"].
- "document_type": (String) A single classification (e.g., "Invoice", "Contract", "Letter").
- "language": (String) The 2-letter ISO code (e.g., "de", "en", "fr", "sv"). Use "und" if unknown.
Ensure the output is ONLY the raw JSON object so it can be parsed programmatically.
DATE EXTRACTION RULES
- You MUST extract the date in EXACTLY YYYY-MM-DD format (e.g., 2026-02-11).
- If you see a date like "1er janvier 2024", convert it to "2024-01-01".
- If you see "26/06/2020", convert it to "2020-06-26".
- NEVER use slashes, dots, or words in the date.
- If you cannot find a valid date, do not provide a date at all.
Important rules for the analysis:
For tags:
- FIRST check the existing tags before suggesting new ones
- Use only relevant categories
- Maximum 4 tags per document, less if sufficient (at least 1)
- Avoid generic or too specific tags
- Use only the most important information for tag creation
- The output language is the one used in the document! IMPORTANT!
For the title:
- Short and concise, NO ADDRESSES
- Contains the most important identification features
- For invoices/orders, mention invoice/order number if available
- The output language is the one used in the document! IMPORTANT!
For the correspondent:
- Identify the sender or institution
When generating the correspondent, always create the shortest possible form of the company name (e.g. "Amazon" instead of "Amazon EU SARL, German branch")
For the language:
-
Carefully analyze the text to determine the document language.
-
Look for specific keywords (e.g., "Facture" for French, "Invoice" for English, "Faktura" for Swedish).
-
If the language is not clear, use "und" as a placeholder.
-
Use ISO codes: "fr" for French, "en" for English, "sv" for Swedish, "de" for German.
-
CRITICAL: If the text is in French, use "fr". Do not confuse it with Spanish.
2.2
`### ROLE
You are a professional document indexer for Paperless-ngx.
INSTRUCTIONS
Extract information into a JSON object.
JSON SCHEMA
{
"title": "Short title + invoice number",
"correspondent": "Company name",
"tags": ["Tag1", "Tag2"],
"document_type": "Invoice/Letter/etc",
"language": "fr/en/sv",
"metadata_date_info": "Place any dates found here"
}
CRITICAL RULES
- Do NOT use the key "document_date".
- If the document is French, language MUST be "fr".
- Return ONLY the raw JSON. No conversational text.
3 check logs
docker logs paperless-ai | grep -i scheduler
results::
[WARN] Invalid date format: 01/04/2025, using fallback date: 01.01.1990
β Expected Behavior
Dates are working fine in Paperless-NGX and I dont want Paperless-AI to mess with them
β Actual Behavior
Paperless-AI does not understand the dates and resets them
π·οΈ Paperless-AI Version
docker:latest ie v.3.0.9
π Docker Logs
[WARN] Invalid date format: 01/04/2025, using fallback date: 01.01.1990π Paperless-ngx Logs
πΌοΈ Screenshots of your settings page
No response
π₯οΈ Desktop Environment
Linux
π» OS Version
mint
π Browser
Other
π’ Browser Version
No response
π Mobile Browser
No response
π Additional Information
- I have checked existing issues and this is not a duplicate
- I have tried debugging this issue on my own
- I can provide a fix and submit a PR
- I am sure that this problem is affecting everyone, not only me
- I have provided all required information above
π Extra Notes
If there are extra requirements for Paperless-AI to fully understand the Paperless-NGX dates, it would be good to know. Do the docker host have anything special, locale etc? As said, I have been trying to revolve for hours using different AI's to no avail. I am not a programmer.