mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond
This document describes the artifacts accompanying our paper: "mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond". 'mono' represents Multi-agent Operated Noise Outfilter. The artifacts are organized in the following directories:
During the submission and review process of this work, Large Language Models (LLMs) and Agentic frameworks have advanced rapidly. Reflecting on these developments, we recognized that the limitations encountered by the Context Agent and Analyzed Agent in our earlier paper were not strictly due to the inherent complexity of the CVEs, but rather the capability bottlenecks of the agents available at the time.
In our preliminary follow-up experiments, we observed that many CVEs previously categorized as Undecidable Patches—which human experts also considered exceptionally difficult to analyze purely from source code—could be effectively handled by a more powerful model like GPT-5.4 in codex. Because of its vast comprehension of modern codebases and the open-source nature of these projects, the advanced agent was remarkably better at pinpointing details within the source code.
However, this is not an end. Our preliminary experiment with the codex revealed the following:
- codex with GPT-5.4 still cannot understand everything without external information; it proactively classified a significant portion of the cases as Undecidable Patches.
- The vulnerabilities successfully identified by GPT-5.4 often far exceed the comprehension scope of an average developer, a finding we corroborated through manual human review.
- Codex can retrieve the necessary information from internet for understanding the CVEs, rendering the idea of analyzing them purely from the source-code level a moot point.
Therefore, we have provide the raw dataset and results generated by our previous model as is. We provide this as a historical baseline and a reference point for future research. Please use this dataset with caution, keeping the evolving capabilities of AI agents in mind.
Files ending in _error.json contain the CVEs that our agent failed to resolve.
This directory contains the source code of our project.
This subfolder contains the final dataset, MonoLens, generated and analyzed by our framework.
The subfolders within MonoLens are organized as follows:
This directory provides a sample of 8 data entries in the csv file and the overall stats of these samples. Each entry includes the original CVE metadata, the root cause analysis performed by our agent, and other relevant information. It also contains a reference to a corresponding folder within other_context folder, which holds the complete analysis results and the step-by-step process undertaken by the agent.
This directory contains the subset of CVEs for which our agent's final confidence score in its analysis was greater than 0.9. The other_context subfolder is ommitted due to the large size of the data.
This directory includes the results for all CVEs that our agent was able to process and analyze. The other_context subfolder is ommitted due to the large size of the data.
This directory showcases the complete analysis process of our mono framework for four specific cases, each with an ReadMe.md. It details the entire pipeline:
-
Stage1. Patch Pre-filtering and Classification: Filtering of security-related patches. -
Stage2. Data Acquisition and Preprocessing: Preprocessing using Joern to generate Code Property Graphs (CPGs). The binary files (cpg.bin), whole repo are excluded due to its large size. -
Stage3. Iterative Contextual Analysis: Including:- The agent's analysis of the CVEs.
- The contextual information gathered to understand the root cause of the CVE.
- The context as understood and summarized by the agent.
This directory is dedicated to the research questions (RQs) addressed in our paper. Each RQ has its own subfolder, which contains:
- The specific code used for that RQ.
- The data relevant to that RQ.
- The final results obtained for that RQ.
Each RQ subfolder also includes its own ReadMe.md file providing more detailed information specific to that research question.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@misc{gao2025monocleanvulnerabilitydataset,
title={mono: Is Your "Clean" Vulnerability Dataset Really Solvable? Exposing and Trapping Undecidable Patches and Beyond},
author={Zeyu Gao and Junlin Zhou and Bolun Zhang and Yi He and Chao Zhang and Yuxin Cui and Hao Wang},
year={2025},
eprint={2506.03651},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2506.03651},
}