-
Notifications
You must be signed in to change notification settings - Fork 620
Open
Description
Hello there,
I'm trying to get nougat up and running within a Docker environment. My Dockerfile looks as follows:
FROM nvidia/cuda:13.0.2-cudnn-devel-ubuntu24.04
WORKDIR /usr/src/app
RUN apt-get update && apt-get install -y git python3-full
RUN python3 -m venv nougatpy/
RUN nougatpy/bin/pip install transformers==4.38.2 requests git+https://github.com/facebookresearch/nougat
COPY parse_pdfs.sh nougatpy/
RUN chmod +x nougatpy/parse_pdfs.sh
EXPOSE 7950
CMD ["bash", "nougatpy/parse_pdfs.sh"]
The parse_pdfs.sh file checks for files in a directory and then runs them through nougat using:
nougat "$FILE" -o "$OUTPUT_DIR" -m 0.1.0-base --full-precision --recompute --batchsize 8
However, the result always looks as follows:
/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/functional.py:505: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4317.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
0%| | 0/3 [00:00<?, ?it/s]ERROR:root:'PdfDocument' object has no attribute 'render'
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
WARNING:root:Image not found
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
WARNING:root:Image not found
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
ERROR:root:list index out of range
67%|██████▋ | 2/3 [00:00<00:00, 307.34it/s]
Traceback (most recent call last):
File "/usr/src/app/nougatpy/bin/nougat", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/predict.py", line 166, in main
for i, (sample, is_last_page) in enumerate(tqdm(dataloader)):
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/tqdm/std.py", line 1181, in __iter__
for obj in iterable:
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 732, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 788, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/app/nougatpy/lib/python3.12/site-packages/nougat/utils/dataset.py", line 114, in ignore_none_collate
_batch[-1] = (_batch[-1][0], name)
~~~~~~^^^^
IndexError: list index out of range
-> Cannot close object; pdfium library is destroyed. This may cause a memory leak.
I've seen the downgrade discussion on transformers (and I did; see the Dockerfile), but this does not help at all.
Any ideas on how to proceed?
Thanks
Mario
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels