Skip to content

Modify PDFExtractor to support skippedItems (Part 1)#5533

Open
gamue wants to merge 8 commits intoportfolio-performance:masterfrom
gamue:skipped-items
Open

Modify PDFExtractor to support skippedItems (Part 1)#5533
gamue wants to merge 8 commits intoportfolio-performance:masterfrom
gamue:skipped-items

Conversation

@gamue
Copy link
Contributor

@gamue gamue commented Feb 28, 2026

Closes #5517

Especially in the AccountStatement-methods that work with very generic regexs like "^[\\d]{2} [\\p{L}]{3,4}([\\.]{1})?.*$" it's not possible to change the return null, otherwise there will be lots of SkippedItem. I think best to keep it as it. wdyt?

  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/ComdirectPDFExtractor.java:685
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/CreditSuisseAGPDFExtractor.java:350
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/MLPBankingAGPDFExtractor.java:501
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/N26BankAGPDFExtractor.java:68
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/NIBCBankPDFExtractor.java:231
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/PostfinancePDFExtractor.java:985
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/QuirinBankAGPDFExtractor.java:721
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/RaiffeisenBankgruppePDFExtractor.java:904
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/SBrokerPDFExtractor.java:1166
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/ScalableCapitalPDFExtractor.java:767
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/SwissquotePDFExtractor.java:644
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/TradeRepublicPDFExtractor.java:987
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/WealthsimpleInvestmentsIncPDFExtractor.java:169

The following ones will be handled in a follow-up PR:

  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DABPDFExtractor.java:1093
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DADATBankenhausPDFExtractor.java:745
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DZBankGruppePDFExtractor.java:653
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DegiroPDFExtractor.java:361
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DekaBankPDFExtractor.java:438
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DeutscheBankPDFExtractor.java:692
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/Direkt1822BankPDFExtractor.java:380
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/DkbPDFExtractor.java:839
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/EasyBankAGPDFExtractor.java:835
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/ErsteBankPDFExtractor.java:1009
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/FinTechGroupBankPDFExtractor.java:841
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/LimeTradingCorpPDFExtractor.java:115
  • name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/TigerBrokersPteLtdPDFExtractor.java:194

@buchen
Copy link
Member

buchen commented Feb 28, 2026

Hi @gamue, I propose not add more banks to this pull request. Instead, let's start a new pull request for the other banks. This change is not a simple "do a refactoring without semantic change". It needs more review, and then it is easier to do this on smaller diffs.

Are the banks here in the pull request don from your point of view? Let me know when I should take a look at the pull request

@gamue
Copy link
Contributor Author

gamue commented Feb 28, 2026

I propose not add more banks to this pull request. Instead, let's start a new pull request for the other banks. This change is not a simple "do a refactoring without semantic change". It needs more review, and then it is easier to do this on smaller diffs.

Agree, and actually had the same idea :) Was already reviewing it more in depth on my own. Think the current state should be fine for review, just a few notes/thoughts:

  • The Comdirect is one that we might want to keep as null, but couldn't really understand which is the skipped-item in the tests I've checked
  • Might be worth defining what should be done when return null was checked and is fine, to differ between this and unchecked ones. I left a comment at a few. Especially the TradeRepublic one is hard as this uses very generic regex and would result in many SkipItems
  • EDIT: Was rechecking something and noticed that in some cases that are currently result in an failure, it might be better to use skippedItem instead. As an example I've adjusted the Vorabpauschale-Import at Scalable, when there's 0 Euro. Looks like return null might not be the only option to check

@gamue gamue changed the title Modify PDFExtractor to support skippedItems Modify PDFExtractor to support skippedItems (Part 1) Mar 1, 2026
@gamue gamue marked this pull request as ready for review March 1, 2026 21:26
return null;
return item;

return new SkippedItem(item, Messages.MsgErrorTransactionTypeNotSupportedOrRequired);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gamue
I am reviewing the changes to the PDF importer (and started with Scalable - one step at a time).

I am not sure about the side effects of this change.
The block pattern is a simple ^Erstattete Steuern[\\s]*$ and then there is only one matching, but optional element. There is no test that hits the skipped item.

You introduced this TaxLostAdjustmentTransaction. What was the reason to mark it optional() in the first place?

If we convert this to a SkippedItem, then - by definition - PP will stop attempting to try other blocks. My gut feeling is we should return a skipped item only if we reliably know that we parsed a transaction. But maybe you have example cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I digged a little bit deeper as I couldn't remember that change and actually Nirus did this change (bde411b ) background is that back then there's been multiple Scalable change-requests and they got bundled in this change. This particular was #5397


if (t.getCurrencyCode() != null && t.getAmount() == 0)
ctx.markAsFailure(Messages.MsgErrorTransactionTypeNotSupportedOrRequired);
return new SkippedItem(item, Messages.MsgErrorTransactionTypeNotSupportedOrRequired);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the kind of transactions we want to mark as skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skipped Items - PDF import: Tracking new usage of Skipped Item

2 participants