-
Notifications
You must be signed in to change notification settings - Fork 620
Description
I have been working on setting up a stable environment for nougat-ocr (v0.1.17) and encountered several breaking issues due to dependency drift and unpinned versions. Below is a summary of the critical errors and the fixes required to get the package running in 2025.
I have compiled these fixes into a standalone setup script/repository here for anyone facing similar issues. It creates a new conda environment that is, as far as I can tell, minimal to have a working installation. Several choices are not fully generic (in particular, I use cuda 11.8) and are specific to the resources available to me, but I believe the issues below to be fairly general.
1. pypdfium2 Compatibility (Critical)
The Issue:
Newer versions of pypdfium2 (v4.0+) strictly require file paths to be strings, but nougat passes pathlib.Path objects.
Symptom:
Immediate crash with TypeError or AttributeError when attempting to load a PDF.
The Fix:
Cast the filepath to a string before passing it to the document loader.
# In nougat/dataset/rasterize.py
# Change:
doc = pdfium.PdfDocument(filepath)
# To:
doc = pdfium.PdfDocument(str(filepath))2. API Parallel Rendering Crash
The Issue:
When running nougat_api, the rasterize_paper function attempts to use multiprocessing to render pages. This fails when the input is passed as bytes or a temporary file handle, or simply due to context issues within the API server.
Symptom:
The API returns a 500 error with ValueError: Can only render in parallel with file path input.
The Fix:
Disable the parallel execution path in rasterize.py and force the serial rendering loop.
3. Transformers Segmentation Faults
The Issue:
Recent versions of transformers (likely 4.38+) introduced changes that cause nougat to segfault during the inference generation loop on CUDA.
Symptom:
The process dies silently or segfaults while processing a page.
The Fix:
Pin transformers==4.37.2.
4. Environment Stability
To ensure reproducibility, I found it necessary to strictly pin the following dependencies:
torch==2.5.1torchvision==0.20.1torchaudio==2.5.1pypdfium2==4.0.0fastapi==0.123.10uvicorn==0.38.0
Solution
I have created a repository that automates the creation of a working conda environment with these specific versions and applies the necessary code patches post-install.
https://github.com/KylePoe/nougat-setup-script
Hopefully, this helps others trying to use this excellent tool.
DISCLAIMER: I extensively employed LLMs to produce these changes. It may contain suboptimal fixes (in particular, I am skeptical of the parallel rendering fix). I am only sharing this because, as noted, I could not otherwise get it to work, and I thought others might find it useful.