Skip to content

PDFDocument object has no attribute 'render' and pdfium library destroyed (despite transformers 4.38.2) -- what next? #263

@MarHai

Description

@MarHai

Hello there,

I'm trying to get nougat up and running within a Docker environment. My Dockerfile looks as follows:

FROM nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04

WORKDIR /usr/src/app

RUN apt-get update && apt-get install -y git python3-full
RUN python3 -m venv nougatpy/

RUN nougatpy/bin/pip install transformers==4.38.2 requests git+https://github.com/facebookresearch/nougat

COPY parse_pdfs.sh nougatpy/
RUN chmod +x nougatpy/parse_pdfs.sh

EXPOSE 7950

CMD ["bash", "nougatpy/parse_pdfs.sh"]

The parse_pdfs.sh file checks for files in a directory and then runs them through nougat using:

nougat "$FILE" -o "$OUTPUT_DIR" -m 0.1.0-base --full-precision --recompute --batchsize 8

However, the result always looks as follows:

/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4317.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
  0%|          | 0/3 [00:00<?, ?it/s]ERROR:root:'PdfDocument' object has no attribute 'render'
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
WARNING:root:Image not found
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
WARNING:root:Image not found
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
 67%|██████▋   | 2/3 [00:00<00:00, 307.34it/s]
Traceback (most recent call last):
  File "/usr/src/app/nougatpy/bin/nougat", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/predict.py", line 166, in main
    for i, (sample, is_last_page) in enumerate(tqdm(dataloader)):
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 732, in __next__
    data = self._next_data()
           ^^^^^^^^^^^^^^^^^
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 788, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
    return self.collate_fn(data)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/nougatpy/lib/python3.12/site-packages/nougat/utils/dataset.py", line 114, in ignore_none_collate
    _batch[-1] = (_batch[-1][0], name)
                  ~~~~~~^^^^
IndexError: list index out of range
-> Cannot close object; pdfium library is destroyed. This may cause a memory leak.

I've seen the downgrade discussion on transformers (and I did; see the Dockerfile), but this does not help at all.

Any ideas on how to proceed?

Thanks
Mario

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions