|
| 1 | +# Utilizing OSS-Fuzz-Gen to Improve Fuzz Testing for OpenPrinting Projects |
| 2 | + |
| 3 | +- **Year:** 2025 |
| 4 | +- **Contributor:** Zixuan Liu |
| 5 | + |
| 6 | +- **Organization**: OpenPrinting, The Linux Foundation |
| 7 | + |
| 8 | +- **Mentors**: Till Kamppeter, Jiongchi Yu, George-Andrei Iosif |
| 9 | + |
| 10 | +- **Useful Links**: |
| 11 | + - [Project Page](https://summerofcode.withgoogle.com/programs/2025/projects/gRbcUkWB) |
| 12 | + - [Source Code for Fuzz Harnesses](https://github.com/OpenPrinting/fuzzing) |
| 13 | + - OSS-Fuzz Projects |
| 14 | + - [cups](https://github.com/google/oss-fuzz/tree/master/projects/cups) |
| 15 | + - [libcups](https://github.com/google/oss-fuzz/tree/master/projects/libcups) |
| 16 | + - [cups-filters](https://github.com/google/oss-fuzz/tree/master/projects/cups-filters) |
| 17 | + - [OSS-Fuzz-Gen](https://github.com/google/oss-fuzz-gen/tree/main) and [Light-Fuzz-Gen](https://github.com/pushinl/light-fuzz-gen) |
| 18 | + |
| 19 | +## Project Detail |
| 20 | + |
| 21 | +This project aims to improve fuzz testing for OpenPrinting’s C/C++ codebases by leveraging OSS-Fuzz-Gen, a new framework that uses Large Language Models (LLMs) to assist fuzz testing. While some OpenPrinting projects are already integrated into Google’s OSS-Fuzz, current fuzzing efforts achieve limited runtime coverage (e.g., only 11.84% for cups), leaving many functions untested. To address this, the project will (1) refine existing fuzzers, (2) improve corpus and dictionary quality using LLMs, and (3) generate additional fuzz harnesses with OSS-Fuzz-Gen to improve the coverage. This will enhance test depth, uncover hidden vulnerabilities, and strengthen the security of OpenPrinting projects. |
| 22 | + |
| 23 | +However, during practice I identified some major limitations of OSS-Fuzz-Gen. The biggest flaw is that when a YAML defines multiple functions, OSS-Fuzz-Gen creates a separate fuzzer for each function. This makes it hard to capture the relationships of functions called —— which limits coverage. Besides, OSS-Fuzz-Gen is highly encapsulated and heavy. Even minor issues like network instability can cause the entire pipeline to fail. To address these issues, I proposed Light-Fuzz-Gen (https://github.com/pushinl/light-fuzz-gen), with the aim of eventually integrating it back into OSS-Fuzz-Gen. This approach allowed me to generate a large number of harnesses and seeds, which significantly improved the code coverage of the OpenPrinting project. |
| 24 | + |
| 25 | +## Achievement |
| 26 | + |
| 27 | +- **Light-Fuzz-Gen.** I proposed Light-Fuzz-Gen, a lightweight tool that utilizes large language models such as GPT-4o, Gemini-2.5-Pro, etc. to automatically generate fuzz test harness code. This tool consists of two modules, **code analysis-YAML generation module** and **harness code generation module**. Harness code generation module can automatically generate harness code suitable for fuzz testing by providing target function signatures, parameter types, and return types. Moreover, due to the fact that the harness generation of OSS-Fuzz-Gen is also based on YAML configuration files containing information about the entry functions to be tested, the creation of these configuration files often requires sufficient understanding of the target project, which hinders the rapid integration of fuzz testing for new projects. So I added code analysis-YAML generation module, a workflow that can automatically analyze the source code of the target project and generate a certain number of YAML configuration files. It automatically understands the source code of the target project and generates possible YAML files through multiple steps such as parsing function symbols, batch inputting LLM for analysis, summarizing output, and post-processing. At least 20 YAML configuration files have been generated through this module, and 4 are actually used to generate harness code. Although designed for Light-Fuzz-Gen, it can be used for OSS-Fuzz-Gen after post-processing. |
| 28 | + |
| 29 | +- **Tested over 10 harnesses and finally added 6 that achieved good results.** This means that I have increased the coverage of cups from 11% to 30%, libcups from 14% to 17%, and discovering **15 new issues** through OSS-Fuzz. The details are as follows: |
| 30 | + - Fixed the `Makefile` to support OSS-Fuzz-Gen. |
| 31 | + - Added fuzzers: `fuzz_ipp_gen`, `fuzz_ppd_gen`, `fuzz_ppd_gen_cache `, `fuzz_ppd_gen_conflicts`, `fuzz_http_core`, and so on. |
| 32 | + - For each fuzzer, unique seeds and corpora are involved. |
| 33 | + - Fixed most of the memory leaks in existing fuzzers. |
| 34 | + |
| 35 | +- **Create 8 PRs:** |
| 36 | + |
| 37 | + [Support oss-fuzz-gen harnesses by avoiding nested directories by pushinl · Pull Request #9 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/9) |
| 38 | + |
| 39 | + [fix most of the memory leaks in fuzz_ppd && add fuzz_ipp_gen by pushinl · Pull Request #19 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/19) |
| 40 | + |
| 41 | + [add fuzz ppd gen 1 by pushinl · Pull Request #20 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/20) |
| 42 | + |
| 43 | + [add some new ppd fuzzers by pushinl · Pull Request #21 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/21) |
| 44 | + |
| 45 | + [fix fuzz_ppd_cache, del fuzz_ppd_options by pushinl · Pull Request #23 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/23) |
| 46 | + |
| 47 | + [add http related fuzzer and corpus by pushinl · Pull Request #24 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/24) |
| 48 | + |
| 49 | + [add libcups fuzzer && Makefile by pushinl · Pull Request #25 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/25) |
| 50 | + |
| 51 | + [add seeds for cups-filters and update Makefile by pushinl · Pull Request #39 · OpenPrinting/fuzzing](https://github.com/OpenPrinting/fuzzing/pull/39) |
| 52 | + |
| 53 | +- **Coverage details e.g.:** |
| 54 | + |
| 55 | + before: |
| 56 | + |
| 57 | +  |
| 58 | + |
| 59 | + after: |
| 60 | + |
| 61 | +  |
| 62 | + |
| 63 | +  |
| 64 | + |
| 65 | +## About OSS-Fuzz-Gen and LLM-driven harness generation |
| 66 | + |
| 67 | +During the process of using OSS-Fuzz-Gen to generate harness for cups, I found that OSS-Fuzz-Gen has many limitations: |
| 68 | + |
| 69 | +- **One fuzzer per function — no call relationships** |
| 70 | + When a YAML defines multiple functions, OSS-Fuzz-Gen creates separate fuzzer for each function. This makes it hard to capture the relationships of functions call —— which limits coverage. For a module that requires fuzz testing, they may use some structured parameters depending on some of the more important functions. For example, ppdOpenFile is an important function that needs to be used, but the old method cannot input prompts about it into LLM when fuzzing related entry functions. |
| 71 | +- **Heavy and encapsulation** |
| 72 | + OSS-Fuzz-Gen is highly encapsulated and heavy. Even minor issues like network instability can cause the entire pipeline to fail. |
| 73 | + |
| 74 | +The coverage of harness generated by OSS-Fuzz-Gen vs human-written harness: |
| 75 | + |
| 76 | + |
| 77 | + |
| 78 | +So I tried to develop Light-Fuzz-Gen to generate harnesses with multi-function based on OSS-Fuzz-Gen. I optimized YAML configuration file, and designed a system which can analyze multiple functions in the configuration file, understand the relationships between them, and generate harness code that can reasonably call these functions. That's my initial optimization idea for LLM-driven harness generation. |
| 79 | + |
| 80 | +> However, there is currently no relevant issue mentioning this point. I will probably try to summarize my viewpoint and raise an issue. And Light-Fuzz-Gen cannot replace OSS-Fuzz-Gen. What I propose is a lightweight tool that cannot fully automate the entire process of harness generation, compilation, and testing, and requires human-in-loop. |
| 81 | +
|
| 82 | +The initial architectural diagram of Light-Fuzz-Gen: |
| 83 | + |
| 84 | + |
| 85 | + |
| 86 | +## Challenge |
| 87 | + |
| 88 | +OSS-Fuzz-Gen can attempt to automatically fix compilation issues with generated harnesses by inputting compilation errors, harness, and more information back into LLM. However, when experiencing OSS-Fuzz-Gen, I felt that its effectiveness was relatively small. Light-Fuzz-Gen does not have an integrated compilation process, so it cannot be added this feature. The compilation of C/C++programs is relatively complex, and perhaps human in loop is still indispensable. |
| 89 | + |
| 90 | +Light-Fuzz-Gen also has many limitations need to resolve, such as: |
| 91 | + |
| 92 | +- The context compression method for understanding source code and workflow may not be very good |
| 93 | +- Lack of more professional and diverse prompts combinations |
| 94 | +- Harness compilation issues |
| 95 | + |
| 96 | +Suggestions for OpenPrinting projects try to integrate with OSS-Fuzz-Gen: |
| 97 | + |
| 98 | +- More integration is needed for projects such as cups-filters and cups-browsed. **The foundation of integrating with OSS-Fuzz-Gen is the integration with OSS-Fuzz**, as OSS-Fuzz-Gen builds the basic environment based on the configuration files (such as sh, Makefile, etc.) of the target project in OSS-Fuzz. If the original configuration file does not support the compilation of new files, it may lead to an invalid process, which is also the reason for the first PR I proposed. |
| 99 | +- The YAML configuration file determines the generation of fuzzers by LLM. To use OSS-Fuzz-Gen, you need to choose the target entry function and write the configuration file yourself; And using Light-Fuzz-Gen supports automatic analysis of target projects and generation of YAML files, which can also be used as suggestions. |
| 100 | + |
| 101 | +## Future Development |
| 102 | + |
| 103 | +In practice, we found that there is still no truly comprehensive method or system for large-model-based fuzz testing. Both OSS-Fuzz-Gen and Light-Fuzz-Gen still require a human-in-the-loop and struggle to support more complex testing projects and environments. We hope that, building on these existing open-source projects, we can continue to explore ways to maximize the capabilities of LLMs and design more effective workflows to support the development of intelligent fuzzing systems. |
| 104 | + |
| 105 | +In the short term, we can do: |
| 106 | + |
| 107 | +- Further improvement and integration of OSS-Fuzz-Gen and Light-Fuzz-Gen |
| 108 | +- Resolve the difficulties in fuzzer building and integration for projects such as cups-filters and libcupsfilters |
| 109 | +- More issue triage and analysis |
| 110 | + |
| 111 | +## Ackowledgment |
| 112 | + |
| 113 | +I would like to express my sincere gratitude to everyone who supported and collaborated on this project. In particular, **Till Kamppeter** provided important guidance on the overall direction and priorities of the GSoC project, and provided me with many suggestions regarding the OpenPrinting project. I am also truly grateful to **Jiongchi Yu**, who patiently answered all of my detailed questions, offering clear explanations and thoughtful suggestions that helped me overcome many challenges. Special thanks go to **George-Andrei Iosif**, whose deep insights into fuzzing and detailed guidance on OSS-Fuzz have been of great help to me. This work would not have been possible without the help from all of you. |
0 commit comments