Example PDF archive #302
EliotJones
started this conversation in
General
Replies: 2 comments
-
In #532 @BobLd provide an additional PDF repository here: https://github.com/pdf-association/pdf-corpora#safedocs-issue-tracker-corpus |
Beta Was this translation helpful? Give feedback.
0 replies
-
The attach zip contains the URLs of of an additional 30,000+ PDFs (~50GB) are available. To download (unix *nix or windows subsystem for linux (WSL)) enter: cut -d, -f1 URLsofExamplePDFs.txt | wget -i - |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Since it is useful to have many PDF documents when carrying out document layout analysis, new feature development and performance work I'm sharing the test archive I put together during the initial development of the library.
https://drive.google.com/file/d/1C6bD4BVIc4pxT4oDEmUT1jLV_oi9NYUh/view?usp=sharing
These PDFs are useful for testing bugfixes and running profiling since they represent a broad cross-section of producers and document types.
Beta Was this translation helpful? Give feedback.
All reactions