Skip to content

InvoiceReferencedDocument parsing incomplete - missing UBL support and flawed multi-document logic #1032

@burak-inan

Description

@burak-inan

Description

The current implementation of invoice referenced document parsing in the library has several limitations that prevent proper handling of various e-invoicing formats:

Issues

1. Missing UBL support in XPath

The XPath query only checks for CII format (InvoiceReferencedDocument) but doesn't handle UBL's BillingReference/InvoiceDocumentReference or BillingReference/CreditNoteDocumentReference structures.

2. Incomplete date parsing in ReferencedDocument.fromNode()

The fromNode() method attempts to extract FormattedIssueDateTime directly as a string, but in CII format this is a nested structure:

<FormattedIssueDateTime>
    <DateTimeString format="102">20240115</DateTimeString>
</FormattedIssueDateTime>

The method calls nodes.getAsStringOrNull(new String[]{"FormattedIssueDateTime"}) which won't correctly extract the nested DateTimeString value.

3. Flawed multi-document handling logic

When multiple invoice references exist, the first one is stored in invoiceReferencedDocumentID field, and subsequent references are added to the collection. However, the condition for adding documents excludes the first reference:

if (doc != null && (!Objects.equals(zpp.getInvoiceReferencedDocumentID(), doc.getIssuerAssignedID()) 
    || !Objects.equals(zpp.getInvoiceReferencedIssueDate(), doc.getFormattedIssueDateTime()))) {
    zpp.addInvoiceReferencedDocument(doc);
}

This means the first referenced document's complete details (TypeCode, ReferenceTypeCode, IssueDate) are lost - only its ID is captured in invoiceReferencedDocumentID. The collection only contains references 2 onwards with full details.

Current Code

The importer uses:

String invoiceReferencedDocumentID = this.extractString("//*[local-name()=\"InvoiceReferencedDocument\"]/*[local-name()=\"IssuerAssignedID\"]|//*[local-name()=\"BillingReference\"]/*[local-name()=\"InvoiceDocumentReference\"]/*[local-name()=\"ID\"]");
if (!invoiceReferencedDocumentID.isEmpty()) {
    zpp.setInvoiceReferencedDocumentID(invoiceReferencedDocumentID);  // First reference ID only
}

xpr = xpath.compile("//*[local-name()=\"InvoiceReferencedDocument\"]");  // Only CII!
NodeList nodes = (NodeList)xpr.evaluate(this.getDocument(), XPathConstants.NODESET);
if (nodes.getLength() != 0) {
    for(int i = 0; i < nodes.getLength(); ++i) {
        Node currentItemNode = nodes.item(i);
        ReferencedDocument doc = ReferencedDocument.fromNode(currentItemNode);
        // This condition SKIPS the first document since it matches invoiceReferencedDocumentID
        if (doc != null && (!Objects.equals(zpp.getInvoiceReferencedDocumentID(), doc.getIssuerAssignedID()) 
            || !Objects.equals(zpp.getInvoiceReferencedIssueDate(), doc.getFormattedIssueDateTime()))) {
            zpp.addInvoiceReferencedDocument(doc);
        }
    }
}

And ReferencedDocument.fromNode() uses:

ReferencedDocument rd = new ReferencedDocument(
    nodes.getAsStringOrNull(new String[]{"IssuerAssignedID", "ID"}),
    nodes.getAsStringOrNull(new String[]{"TypeCode", "DocumentTypeCode"}),
    nodes.getAsStringOrNull(new String[]{"ReferenceTypeCode"}),
    XMLTools.tryDate(nodes.getAsStringOrNull(new String[]{"FormattedIssueDateTime"}))  // Won't get nested DateTimeString
);

Expected Behavior

The library should:

  • Support both CII and UBL formats in the XPath query:
  //*[local-name()='InvoiceReferencedDocument']
  |//*[local-name()='BillingReference']/*[local-name()='InvoiceDocumentReference']
  |//*[local-name()='BillingReference']/*[local-name()='CreditNoteDocumentReference']
  • Fix ReferencedDocument.fromNode() to properly extract nested FormattedIssueDateTime/DateTimeString for CII format
  • Support UBL's IssueDate element (direct text content, not nested)
  • Include ALL referenced documents in the collection with complete details, not skip the first one
  • Either deprecate invoiceReferencedDocumentID in favor of using the collection exclusively, or ensure the first document's full details are available somewhere

Workaround

I've implemented a custom method to handle this properly with explicit child node traversal for both formats and proper collection handling.

Environment

  • Mustang version: 2.21.0
  • Invoice formats affected: ZUGFeRD/Factur-X (CII), UBL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions