Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

How can I read the table that have started on page 1 and extends on multiple pages. #192

@dejanmarkovic

Description

@dejanmarkovic

pypdf_table_extraction/camelot does not recognize the table on pages after page 1 with the lattice flavor.

With the stream method, I get a messed-up output like this one

   0            1            2                                  3                       4         5
0                                                                      2059001013453712313
1                               289 Transakcije po nalogu građana                    PBO:
2                                                                        MARY MILAN
3  5  12.05.2024.  12.05.2024.     n 9001013454849 III rata   maj                    PBZ:  1.600,00
4                                                                  KNEZ MILET 456 4 11
5                                                   Instant nalog            FT241123YJFB4
6                                                         Belgrade

This is the output from the lattice from page one which looks great

0  REDNI\nBROJ  DATUM\nPRIJEMA  DATUM\nIZVRŠENJA  ...  REFERENCA KLIJENTA\nREFERENCA PARTNERA\nREFERE...  NA TERET  U KORIST
1            1     11.05.2024.       12.05.2024.  ...                           PBO:\nPBZ:\nFT201661TXR4            4.200,00
2            2     12.05.2024.       12.05.2024.  ...                           PBO:\nPBZ:\nFT20122CK6Y6            5.600,00
3            3     12.05.2024.       12.05.2024.  ...                           PBO:\nPBZ:\nFT20134Y5NWL            5.600,00
4            4     12.05.2024.       12.05.2024.  ...                           PBO:\nPBZ:\nFT20124QY6JZ            5.600,00

The document is a PDF bank statement.
NOTE: I have randomized the numbers in the output for privacy and security purposes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions