Memory usage grows out of control on large PDFs due to saving images in lattice parser

See #28 - on a large PDF with a lot of images, the fact that Camelot is doing [this](https://github.com/camelot-dev/camelot/blob/master/camelot/parsers/lattice.py#L213):

```python
        # for plotting
        table._image = self.pdf_image  # Reuse the image used for calc                                                                                                                          
```

Leads to ever-increasing memory consumption, which is usually fatal in the case of parallel processing.  For instance on this document of 1100+ pages (and 928 tables detected by Camelot) it ends up using some 20GB of memory: https://www.laval.ca/wp-content/uploads/2025/02/cdu-1-reglement.pdf - if I remove that line, memory usage stays constant around 250MB per worker process.

But also, reusing the image like this is just unnecessary, because in the case where the user wants to do some plotting, the page image [would seem to get regenerated anyway](https://github.com/camelot-dev/camelot/blob/master/camelot/core.py#L615) if it didn't already exist.

EXCEPT! The one-page-at-a-time assumption pervasive in Camelot strikes again, as the code mentioned above won't render the correct page if the table isn't on page 1 (and thus plotting is actually currently broken for pages other than 1 if you don't use the lattice parser)...  So in fact the fixes in #589 to allow processing specific pages in the backend are also necessary to solve this problem.  I've taken the liberty of fixing this in that pull request ;-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage grows out of control on large PDFs due to saving images in lattice parser #620

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Memory usage grows out of control on large PDFs due to saving images in lattice parser #620

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions