Skip to content

[Bug]: reset of create date to 1990-01-01Β #855

@hygroscopiccarpaccio

Description

@hygroscopiccarpaccio

πŸ” Bug Summary

The AI resets the create date to 1990-01-01, but not always

πŸ“– Description

I have noticed that the AI resets "date created" even after explicitly asked not to touch the dates. This renders the app not usable for me as I cannot have the dates for my documents to be changed in any way.

Paperless-NGX v.2.20.6 is a LCX installation (in Proxmox) and Paperless-AI v3.0.9 is a docker installation on another LXC (Proxmox node)

Paperless NGX picks the correct date for Date Created even when there are many document on a document (eg. an invoice).
However, when the AI does its job of tagging, correspondent, title, document classification, it also resets the date.

I use Ollama, model llama3.1:8b and have tried PHI4:latest, same thing. I have in vain spent hours instructing and tweaking the AI setup to not touch the dates.

I have also tested the Example prompt to no avail.

The issue is that the AI sometimes(!) does not recognise the dates coming from Paperless NGX. I dont see what else I can do in paperless ngx other than setting the display date to ISO 8601 - 2026-02-11 (medium).

Why is Paperless-AI even trying to change the create date?
Suggestions?

Thanks

πŸ”„ Steps to Reproduce

  1. have paperless ngx and paperless ai running as per above.
  2. set ai instructions to (2 examples
    2.1 example
    `You are a personalized document analyzer. Your task is to analyze documents and extract relevant information.

Analyze the document content and extract the following information into a valid JSON object using exactly these keys:

  1. "title": (String) A concise, meaningful title. No addresses. Include invoice/order numbers if present. Use the document's language.
  2. "correspondent": (String) The shortest version of the sender/institution name (e.g., "Amazon", not "Amazon EU SARL").
  3. "tags": (Array of Strings) Select up to 4 thematic tags. Use existing categories if possible. IMPORTANT: Return this as a list, e.g., ["Tax", "Invoice"].
  4. "document_type": (String) A single classification (e.g., "Invoice", "Contract", "Letter").
  5. "language": (String) The 2-letter ISO code (e.g., "de", "en", "fr", "sv"). Use "und" if unknown.

Ensure the output is ONLY the raw JSON object so it can be parsed programmatically.

DATE EXTRACTION RULES

  • You MUST extract the date in EXACTLY YYYY-MM-DD format (e.g., 2026-02-11).
  • If you see a date like "1er janvier 2024", convert it to "2024-01-01".
  • If you see "26/06/2020", convert it to "2020-06-26".
  • NEVER use slashes, dots, or words in the date.
  • If you cannot find a valid date, do not provide a date at all.

Important rules for the analysis:

For tags:

  • FIRST check the existing tags before suggesting new ones
  • Use only relevant categories
  • Maximum 4 tags per document, less if sufficient (at least 1)
  • Avoid generic or too specific tags
  • Use only the most important information for tag creation
  • The output language is the one used in the document! IMPORTANT!

For the title:

  • Short and concise, NO ADDRESSES
  • Contains the most important identification features
  • For invoices/orders, mention invoice/order number if available
  • The output language is the one used in the document! IMPORTANT!

For the correspondent:

  • Identify the sender or institution
    When generating the correspondent, always create the shortest possible form of the company name (e.g. "Amazon" instead of "Amazon EU SARL, German branch")

For the language:

  • Carefully analyze the text to determine the document language.

  • Look for specific keywords (e.g., "Facture" for French, "Invoice" for English, "Faktura" for Swedish).

  • If the language is not clear, use "und" as a placeholder.

  • Use ISO codes: "fr" for French, "en" for English, "sv" for Swedish, "de" for German.

  • CRITICAL: If the text is in French, use "fr". Do not confuse it with Spanish.

    2.2
    `### ROLE
    You are a professional document indexer for Paperless-ngx.

INSTRUCTIONS

Extract information into a JSON object.

JSON SCHEMA

{
"title": "Short title + invoice number",
"correspondent": "Company name",
"tags": ["Tag1", "Tag2"],
"document_type": "Invoice/Letter/etc",
"language": "fr/en/sv",
"metadata_date_info": "Place any dates found here"
}

CRITICAL RULES

  • Do NOT use the key "document_date".
  • If the document is French, language MUST be "fr".
  • Return ONLY the raw JSON. No conversational text.

3 check logs
docker logs paperless-ai | grep -i scheduler

results::
[WARN] Invalid date format: 01/04/2025, using fallback date: 01.01.1990

βœ… Expected Behavior

Dates are working fine in Paperless-NGX and I dont want Paperless-AI to mess with them

❌ Actual Behavior

Paperless-AI does not understand the dates and resets them

🏷️ Paperless-AI Version

docker:latest ie v.3.0.9

πŸ“œ Docker Logs

[WARN] Invalid date format: 01/04/2025, using fallback date: 01.01.1990

πŸ“œ Paperless-ngx Logs

πŸ–ΌοΈ Screenshots of your settings page

No response

πŸ–₯️ Desktop Environment

Linux

πŸ’» OS Version

mint

🌐 Browser

Other

πŸ”’ Browser Version

No response

🌐 Mobile Browser

No response

πŸ“ Additional Information

  • I have checked existing issues and this is not a duplicate
  • I have tried debugging this issue on my own
  • I can provide a fix and submit a PR
  • I am sure that this problem is affecting everyone, not only me
  • I have provided all required information above

πŸ“Œ Extra Notes

If there are extra requirements for Paperless-AI to fully understand the Paperless-NGX dates, it would be good to know. Do the docker host have anything special, locale etc? As said, I have been trying to revolve for hours using different AI's to no avail. I am not a programmer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions