-
Notifications
You must be signed in to change notification settings - Fork 2.5k
PDFBox add harnesses and seeds #13873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tballison is a new contributor to projects/pdfbox. The PR must be approved by known contributors before it can be merged. The past contributors are: henryrneh |
I'm leaving this as draft until someone else from the PDFBox project is able to review it: https://issues.apache.org/jira/browse/PDFBOX-6055 |
At least one check is failing (https://github.com/google/oss-fuzz/actions/runs/17247847734/job/48942080785?pr=13873) because #13860 hasn't propagated to the images yet(?). ![]() |
Yeah, the images build once a day so give it 24h or so and we should be good. |
K. I think we're good here. Let me know what you think. |
This adds harnesses for ExtractText and font parsers. It also adds seeds from an arbitrary zip of 1k pdfs from https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/.
Along the way, this also updates maven, and it makes a small modification to allow local builds (which did not work before). Finally, this adds log4j2 as the logging implementation and turns logging off to avoid corrupting the console.
I went with
fuzzerTestOneInput()
rather than the@FuzzTest
annotation because I couldn't getreproduce
to work with the@FuzzTest
annotation. This could be an unrelated issue or user error.