Updated FinTechGroupBank PDF Extractor to support parsing of dividend…#5502
Updated FinTechGroupBank PDF Extractor to support parsing of dividend…#5502buchen wants to merge 1 commit intoportfolio-performance:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the FinTechGroupBank PDF dividend extractor to populate the new dividend ex-date field (Issue #5439), and extends the existing PDF extractor test suite to assert correct ex-date extraction across multiple dividend document variants.
Changes:
- Add optional parsing for dividend ex-date from both
ExtagandEx-DatumPDF line formats. - Extend FinTechGroupBank PDF extractor tests to validate
exDateon extracted dividend transactions (andnullwhere not applicable).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/FinTechGroupBankPDFExtractor.java |
Adds optional parsing logic to set AccountTransaction.exDate for dividend imports. |
name.abuchen.portfolio.tests/src/name/abuchen/portfolio/datatransfer/pdf/fintechgroupbank/FinTechGroupBankPDFExtractorTest.java |
Adds assertions/matchers to verify ex-date extraction across many dividend test cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| .assign((t, v) -> t.setExDate(asDate(v.get("exDate")))), // | ||
| section -> section // | ||
| .attributes("exDate") // | ||
| .match("^Ex-Datum: (?<exDate>[\\d]{1,2}\\. .* [\\d]{4}) CUSIP:.*$") // | ||
| .assign((t, v) -> t.setExDate(asDate(v.get("exDate"))))) |
There was a problem hiding this comment.
The Ex-Datum regex is very spacing-sensitive ("Ex-Datum: " and " CUSIP:" are hard-coded) and the captured group uses a greedy .*, which can accidentally include extra whitespace. Since ExtractorUtils.asDate(...) does not trim input, any leading/trailing whitespace will cause date parsing to fail and abort extraction. Consider making the pattern whitespace-tolerant (e.g., [:\\s]+ / \\s+) and/or ensuring the captured exDate value is trimmed before calling asDate.
| .assign((t, v) -> t.setExDate(asDate(v.get("exDate")))), // | |
| section -> section // | |
| .attributes("exDate") // | |
| .match("^Ex-Datum: (?<exDate>[\\d]{1,2}\\. .* [\\d]{4}) CUSIP:.*$") // | |
| .assign((t, v) -> t.setExDate(asDate(v.get("exDate"))))) | |
| .assign((t, v) -> t.setExDate(asDate(trim(v.get("exDate"))))), // | |
| section -> section // | |
| .attributes("exDate") // | |
| .match("^Ex-Datum[:\\s]+(?<exDate>[\\d]{1,2}\\.\\s+\\p{L}+\\s+[\\d]{4})\\s+CUSIP[:\\s]*.*$") // | |
| .assign((t, v) -> t.setExDate(asDate(trim(v.get("exDate")))))) |
|
No, I don't want that. |
|
Please let me know what you want to change. All 9 pull request apply the parsing of the ex-date to existing banks that actually have the date in the text. That should complete the support for importing ex-date from bank documents. Each is creating a new section - or should it be wrapped into the section parsing the date? Change the order of the Junit matcher expressions? |
2509a2f to
3cb038a
Compare
… ex-date Issue: portfolio-performance#5439 Issue: portfolio-performance#5502 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3cb038a to
0cc3ff4
Compare
… ex-date Issue: portfolio-performance#5439 Issue: portfolio-performance#5502 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0cc3ff4 to
8c5dd97
Compare
|
Merged. The development around ex-date is independent from the development work of SkippedItems. The latter one does not stop the former one. PP is developed incrementally and continuously. There were no comments to the change itself. For future improvements, let's open new pull requests. |
… ex-date
Issue: #5439