generated from allenai/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
🐛 Describe the bug
I am running the model of olmOCR on a remote server and doing the OCR locally.
My pdf is 492 pages long, and I find that the OCR process got stuck on this state for about 10 mins:
2026-02-01 17:08:31,807 - __main__ - INFO -
Worker ID | finished | started
----------+----------+--------
0 | 491 | 492
So I decided to get rid of the last page of my pdf and try again. This time, it still got stuck on the last page!
2026-02-01 17:14:59,176 - __main__ - INFO -
Worker ID | finished | started
----------+----------+--------
0 | 487 | 488
I don't quite see how the last pages are different from the rest, so I think this is a bug.
《高等代数学.第四版》 (谢启鸿 姚慕生 吴泉水 编著) (Z-Library) [480-492].pdf
I used this command to run olmOCR:
python -m olmocr.pipeline \
./localworkspace \
--server http://[SERVER_IP_ADDRESS]/v1 \
--model olmOCR/allenai/olmOCR-2-7B-1025-FP8 \
--markdown --pdfs "《高等代数学.第四版》 (谢启鸿 姚慕生 吴泉水 编著) (Z-Library).pdf"Versions
Python 3.13.11
annotated-types==0.7.0
anthropic==0.76.0
anyio==4.12.1
attrs==25.4.0
beaker-py==2.5.4
beautifulsoup4==4.14.3
bleach==6.3.0
blinker==1.9.0
boto3==1.42.32
botocore==1.42.32
cached_path==1.8.1
certifi==2026.1.4
cffi==2.0.0
charset-normalizer==3.4.4
click==8.3.1
cryptography==46.0.3
cuda-bindings==12.9.4
cuda-pathfinder==1.3.3
defusedxml==0.7.1
distro==1.9.0
docstring_parser==0.17.0
eval_type_backport==0.3.1
fastjsonschema==2.21.2
filelock==3.20.3
Flask==3.1.2
fsspec==2026.1.0
ftfy==6.3.1
fuzzysearch==0.8.1
google-api-core==2.29.0
google-auth==2.47.0
google-cloud-core==2.5.0
google-cloud-storage==3.8.0
google-crc32c==1.8.0
google-genai==1.60.0
google-resumable-media==2.8.0
googleapis-common-protos==1.72.0
greenlet==3.3.0
grpcio==1.76.0
h11==0.16.0
hf-xet==1.2.0
httpcore==1.0.9
httpx==0.28.1
huggingface-hub==0.36.0
idna==3.11
importlib_metadata==8.7.1
invoke==2.2.1
itsdangerous==2.2.0
Jinja2==3.1.6
jiter==0.12.0
jmespath==1.0.1
jsonschema==4.26.0
jsonschema-specifications==2025.9.1
jupyter_client==8.8.0
jupyter_core==5.9.1
jupyterlab_pygments==0.3.0
lingua-language-detector==2.1.1
lxml==6.0.2
markdown-it-py==4.0.0
markdown2==2.5.4
markdownify==1.2.2
MarkupSafe==3.0.3
mdurl==0.1.2
mistralai==1.10.1
mistune==3.2.0
mpmath==1.3.0
nbclient==0.10.4
nbconvert==7.16.6
nbformat==5.10.4
networkx==3.6.1
numpy==2.4.1
nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.4.5
nvidia-nvtx-cu12==12.8.90
olmocr==0.4.20
openai==2.15.0
opentelemetry-api==1.38.0
opentelemetry-exporter-otlp-proto-common==1.38.0
opentelemetry-exporter-otlp-proto-http==1.38.0
opentelemetry-proto==1.38.0
opentelemetry-sdk==1.38.0
opentelemetry-semantic-conventions==0.59b0
orjson==3.11.5
packaging==26.0
pandocfilters==1.5.1
pillow==12.1.0
platformdirs==4.5.1
playwright==1.57.0
proto-plus==1.27.0
protobuf==6.33.4
pyasn1==0.6.2
pyasn1_modules==0.4.2
pycparser==3.0
pydantic==2.12.5
pydantic_core==2.41.5
pyee==13.0.0
Pygments==2.19.2
pypdf==6.6.0
pypdfium2==5.3.0
python-dateutil==2.9.0.post0
python-magic==0.4.27
PyYAML==6.0.3
pyzmq==27.1.0
RapidFuzz==3.14.3
referencing==0.37.0
regex==2026.1.15
requests==2.32.5
rich==13.9.4
rpds-py==0.30.0
rsa==4.9.1
s3transfer==0.16.0
safetensors==0.7.0
sequence_align==0.3.0
setuptools==80.9.0
six==1.17.0
smart_open==7.5.0
sniffio==1.3.1
soupsieve==2.8.3
sympy==1.14.0
syntok==1.4.4
tenacity==9.1.2
tinycss2==1.4.0
tinyhost==0.4.18
tokenizers==0.22.2
torch==2.10.0
tornado==6.5.4
tqdm==4.67.1
traitlets==5.14.3
transformers==4.57.3
triton==3.6.0
typing-inspection==0.4.2
typing_extensions==4.15.0
urllib3==2.6.3
wcwidth==0.3.0
webencodings==0.5.1
websockets==15.0.1
Werkzeug==3.1.5
wheel==0.45.1
wrapt==2.0.1
zipp==3.23.0
zstandard==0.25.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working