Skip to content

Commit 69e5e30

Browse files
gamuebuchen
authored andcommitted
Add Claude Rules for PDF extractors
Issue: portfolio-performance#5560 Signed-off-by: XY <xy> [updated path to include portfolio; small changes] Signed-off-by: Andreas Buchen <andreas.buchen@gmail.com>
1 parent e9095c3 commit 69e5e30

File tree

1 file changed

+67
-0
lines changed

1 file changed

+67
-0
lines changed

.claude/rules/pdfextractors.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
paths:
3+
- "name.abuchen.portfolio/src/name/abuchen/portfolio/datatransfer/pdf/*"
4+
- "name.abuchen.portfolio.tests/src/name/abuchen/portfolio/datatransfer/pdf/**"
5+
---
6+
7+
PDF importers extract transactions from bank and broker PDF statements.
8+
9+
- Each bank/broker has its own extractor class inside `name.abuchen.portfolio/src/name/abuchen/datatransfer/pdf/`
10+
- Tests are located at `name.abuchen.portfolio.tests/src/name/abuchen/datatransfer/pdf/` within a folder for each bank, containing all connected test-exports.
11+
12+
# Implementation
13+
14+
- `AbstractPDFExtractor` is the base class for every pdf-extractor, containing utility methods
15+
- `ExtractorUtils` offers additional utility methods, if the ones from the base-class aren't enough.
16+
- Use `TextUtil` when it's required to manipulate text
17+
- When implementing a new extractor or adding new parsing logic method, check `BaaderBankPDFExtractor` for reference and use the same pattern
18+
- Split logic into methods like `addBuySellTransaction()`, `addDividendTransaction()` or `addInterestTransaction()`
19+
- When extracting information from pdf-documents, create multiple `section`-Blocks.
20+
- Consistency in coding style between all extractors is more important than nice code.
21+
- Add comment-blocks with `@formatter:off` and `@formatter:on` before each section-block showing the format that is handled there.
22+
23+
24+
# Testing
25+
Test methods should be in following structure
26+
27+
```
28+
@Test
29+
public void testWertpapierKauf01()
30+
{
31+
var extractor = new ScalableCapitalPDFExtractor(new Client());
32+
33+
List<Exception> errors = new ArrayList<>();
34+
35+
var results = extractor.extract(PDFInputFile.loadTestCase(getClass(), "Kauf01.txt"), errors);
36+
37+
assertThat(errors, empty());
38+
assertThat(countSecurities(results), is(1L));
39+
assertThat(countBuySell(results), is(1L));
40+
assertThat(countAccountTransactions(results), is(0L));
41+
assertThat(countAccountTransfers(results), is(0L));
42+
assertThat(countItemsWithFailureMessage(results), is(0L));
43+
assertThat(countSkippedItems(results), is(0L));
44+
assertThat(results.size(), is(2));
45+
new AssertImportActions().check(results, "EUR");
46+
47+
// check security
48+
assertThat(results, hasItem(security( //
49+
hasIsin("IE0008T6IUX0"), hasWkn(null), hasTicker(null), //
50+
hasName("Vngrd Fds-ESG Dv.As-Pc Al ETF"), //
51+
hasCurrencyCode("EUR"))));
52+
53+
// check purchase transaction
54+
assertThat(results, hasItem(purchase( //
55+
hasDate("2024-12-12T13:12:51"), hasShares(3.00), //
56+
hasSource("Kauf01.txt"), //
57+
hasNote("Ord.-Nr.: SCALsin78vS5CYz"), //
58+
hasAmount("EUR", 19.49), hasGrossValue("EUR", 18.50), //
59+
hasTaxes("EUR", 0.00), hasFees("EUR", 0.99))));
60+
61+
}
62+
```
63+
64+
- Each method has a starting `assertThat`-Block checking the counts. All 8 assertions need to be present.
65+
- Use `//` to enforce line-breaks when checking securities and transactions
66+
- Include time (hours, minutes, seconds) in `hasDate` when the source document provides it.
67+
- The `ExtractorMatchers`-class contains test assertion helpers

0 commit comments

Comments
 (0)