PDFBox add harnesses and seeds #13873

tballison · 2025-08-26T13:20:05Z

This adds harnesses for ExtractText and font parsers. It also adds seeds from an arbitrary zip of 1k pdfs from https://digitalcorpora.org/corpora/file-corpora/cc-main-2021-31-pdf-untruncated/.

Along the way, this also updates maven, and it makes a small modification to allow local builds (which did not work before). Finally, this adds log4j2 as the logging implementation and turns logging off to avoid corrupting the console.

I went with fuzzerTestOneInput() rather than the @FuzzTest annotation because I couldn't get reproduce to work with the @FuzzTest annotation. This could be an unrelated issue or user error.

…nt harnesses.

github-actions · 2025-08-26T13:21:13Z

tballison is a new contributor to projects/pdfbox. The PR must be approved by known contributors before it can be merged. The past contributors are: henryrneh

tballison · 2025-08-26T13:28:48Z

I'm leaving this as draft until someone else from the PDFBox project is able to review it: https://issues.apache.org/jira/browse/PDFBOX-6055

tballison · 2025-08-26T20:06:27Z

At least one check is failing (https://github.com/google/oss-fuzz/actions/runs/17247847734/job/48942080785?pr=13873) because #13860 hasn't propagated to the images yet(?).

DavidKorczynski · 2025-08-26T20:45:11Z

Yeah, the images build once a day so give it 24h or so and we should be good.

tballison · 2025-08-27T19:26:52Z

K. I think we're good here. Let me know what you think.

…rusted data.

tballison added 3 commits August 26, 2025 09:14

Update maven, enable building from local repo, first steps towards fo…

24ac1a6

…nt harnesses.

Add parser fuzzers and PDFExtractTextFuzzer with seeds

b747d3c

update extract-fonts.sh

77a77bb

tballison added 2 commits August 26, 2025 10:20

actually add log4j2.xml config to turn off logging

2b333c2

some cleanups

8ac7f46

tballison added 4 commits August 27, 2025 09:49

fix typo found by Tilman

87c4deb

better align seeds and font parsers

aab9646

update licenses

11e3fb9

Merge branch 'master' into pdfbox-add-harnesses

13e7d0f

tballison marked this pull request as ready for review August 27, 2025 19:26

tballison added 2 commits August 28, 2025 13:56

Clean up dependencies and remove AFMParserFuzzer, which only parses t…

5cecf9b

…rusted data.

Merge branch 'master' into pdfbox-add-harnesses

5578b62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PDFBox add harnesses and seeds #13873

PDFBox add harnesses and seeds #13873

Uh oh!

tballison commented Aug 26, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

tballison commented Aug 26, 2025

Uh oh!

tballison commented Aug 26, 2025 •

edited

Loading

Uh oh!

DavidKorczynski commented Aug 26, 2025

Uh oh!

tballison commented Aug 27, 2025

Uh oh!

Uh oh!

PDFBox add harnesses and seeds #13873

Are you sure you want to change the base?

PDFBox add harnesses and seeds #13873

Uh oh!

Conversation

tballison commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

tballison commented Aug 26, 2025

Uh oh!

tballison commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DavidKorczynski commented Aug 26, 2025

Uh oh!

tballison commented Aug 27, 2025

Uh oh!

Uh oh!

tballison commented Aug 26, 2025 •

edited

Loading

tballison commented Aug 26, 2025 •

edited

Loading