Skip to content

Conversation

tballison
Copy link
Contributor

@tballison tballison commented Aug 26, 2025

This adds harnesses for ExtractText and font parsers. It also adds seeds from an arbitrary zip of 1k pdfs from https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/.

Along the way, this also updates maven, and it makes a small modification to allow local builds (which did not work before). Finally, this adds log4j2 as the logging implementation and turns logging off to avoid corrupting the console.

I went with fuzzerTestOneInput() rather than the @FuzzTest annotation because I couldn't get reproduce to work with the @FuzzTest annotation. This could be an unrelated issue or user error.

Copy link

tballison is a new contributor to projects/pdfbox. The PR must be approved by known contributors before it can be merged. The past contributors are: henryrneh

@tballison
Copy link
Contributor Author

I'm leaving this as draft until someone else from the PDFBox project is able to review it: https://issues.apache.org/jira/browse/PDFBOX-6055

@tballison
Copy link
Contributor Author

tballison commented Aug 26, 2025

At least one check is failing (https://github.com/google/oss-fuzz/actions/runs/17247847734/job/48942080785?pr=13873) because #13860 hasn't propagated to the images yet(?).

image

@DavidKorczynski
Copy link
Collaborator

Yeah, the images build once a day so give it 24h or so and we should be good.

@tballison tballison marked this pull request as ready for review August 27, 2025 19:26
@tballison
Copy link
Contributor Author

K. I think we're good here. Let me know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants