This repository contains the artifacts for the paper accepted at ICSE'26
Specifically, it includes the source code of the FORGE framework, the dataset constructed by FORGE, and the evaluation results.
- [2026-02-11] 🔥 We have released FORGE-Curated: A curated smart contract vulnerability dataset with moderate size, higher quality, and enhanced feasibility!
- [2025-06-20] 🎉 Our paper has been directly accepted by ICSE'26 Round 1 (Direct Acceptance Rate: 9.2%, 60/646). Can't wait to see you in Rio de Janeiro!
- [2025-03-12] ✨ We have released the largest smart contract vulnerability dataset FORGE-Dataset, together with the source code of FORGE, the first automated vulnerability dataset construction framework.
Important
Following our commitment to responsible maintenance outlined in the paper, we have collected new high-quality audit reports released between December 2024 and February 2026 based on feedback and suggestions from early adopters. With continuously manual verification, we have created FORGE-Curated, a curated moderate-scale EVM smart contract vulnerability dataset that is particularly suitable for tasks such as LLM benchmark evaluation. We strongly recommend users with similar needs to consider using the FORGE-Curated.
FORGE is an automated framework that constructs comprehensive smart contract vulnerability datasets from real-world audit reports. By leveraging large language models (LLMs) and the Common Weakness Enumeration (CWE) standard, FORGE addresses key challenges in existing vulnerability datasets: labor-intensive, error-prone manual construction; inconsistent classification standards; and limited scalability. The FORGE framework consists of four main modules:
- Semantic Chunker: Segments audit reports into meaningful, self-contained chunks
- MapReduce Extractor: Extracts and aggregates vulnerability information from report chunks
- Hierarchical Classifier: Classifies vulnerabilities into the CWE hierarchy using LLM by tree-of-thoughts reasoning
- Code Fetcher: Retrieves and integrates corresponding smart contract project source files
We recommend using the uv package manager for installing and configuring FORGE:
# Clone the repository
git clone https://github.com/FOGRE-security/FORGE-Artifact.git
cd FORGE-Artifact/src
# Install dependencies with uv
uv sync
# Configure model settings
vim config.yaml # Configure LLM and provider API Baseurl
# Set up API keys
cp .env-example .env
vim .env # Configure API-keyAlternatively, you can use pip:
# Install dependencies with pip
pip install -r requirements.txtRun the FORGE to extract, classify, and fetch source code on a sample document:
# Using uv (recommended)
uv run main.py forge -t sample/sample.pdf -o sample
# Or using python directly
python main.py forge -t sample/sample.pdf -o sampleFORGE offers several commands to run different parts of the pipeline:
# Extract vulnerability and project metadata from security documents:
uv run main.py extract -t path/to/documents -o output/directory
# Or: python main.py extract -t path/to/documents -o output/directory
# Classify extracted vulnerabilities into CWE categories:
uv run main.py classify -t path/to/extracted/json
# Or: python main.py classify -t path/to/extracted/json
# Fetch source code based on project metadata from *Github*, *Etherscan*, *Bscscan*, *Polygonscan* and *Basescan*.
uv run main.py fetch -t path/to/project/json
# Or: python main.py fetch -t path/to/project/jsonA potential use case for FORGE is to construct small-scale benchmark datasets for specific vulnerabilities from security artifacts by editing the prompts in src/core/invoker.py.
All commands support the following options:
--log, -l: Path to log directory (default: "logs")--config, -c: Path to config file (default: "config.yaml")
Run python main.py COMMAND --help for command-specific help.
We have made our dataset available in the following ways:
- Vulnerability Information: Available in the
dataset/resultsdirectory of this repository. - Solidity Code Files: Available in the
dataset/contractsdirectory of this repository. - Audit Reports: Due to GitHub storage limitations, audit reports are available through two download options:
- Option 1 - Cloudflare R2: Download via API tokens using any method you prefer. See dataset/access_reports.ipynb for a usage example.
- Option 2 - Google Drive: Direct download from https://drive.google.com/file/d/10u9DrWvtzw8Bo-7jig2KWmua2bS8NPq9.
The dataset constructed by FORGE represents the most comprehensive collection of smart contract vulnerabilities to date, derived from real-world audit reports. Below is an overview of the dataset statistics:
| Statistics | Numbers |
|---|---|
| Total audit reports | 6,454 |
| Total DApp projects | 6,579 |
| Total solidity files | 81,390 |
| Average solidity files in a project | 12 |
| Average line of code in a project | 2,575 |
| Compiler Versions | ---- |
| Compiler Version 0.4+ | 270 |
| Compiler Version 0.5+ | 478 |
| Compiler Version 0.6+ | 1,524 |
| Compiler Version 0.7+ | 360 |
| Compiler Version 0.8+ | 3,791 |
| Other Compiler Version | 31 |
| Vulnerabilities | ---- |
| Total vulnerability findings | 27,497 |
The dataset contains 81,390 Solidity files and 27,497 vulnerabilities across 296 CWE categories, 59.0% of projects use the latest Solidity compiler version (0.8+). Average of 2,575 lines of code per project, representing real-world complexity.
You can use RQ1/statistic.ipynb to analyze and summarize the relevant data within our dataset.
Note
This dataset is dynamically maintained through a community-driven issue system and may differ from the current records.
The evaluation/RQ2/ directory contains the results of our manual annotations for evaluating information extraction capabilities. You can calculate the precision, recall, and F1-score by running the command: python calculate_metrics.py results.json.
The evaluation/RQ3/results.csv file contains information on randomly sampled vulnerability findings, along with the CWE categories independently annotated by two human and those labeled by the LLM. The evaluation/RQ3/k-alpha.csv file is a formatted CSV template exported to meet the requirements of the Krippendorff's Alpha Calculator, which can be used to calculate the inter-rater agreement among the three annotators.
Additionally, the detailed information about the evaluation/RQ3/CWE_s.json.
The CWE classification results for vulnerability classifications by authors across 13 detection tools are stored in evaluation/RQ4/tool_classifications.csv. The evaluation/RQ4/results.csv file presents the outcomes of our analysis tools run on the dataset using the SmartBugs framework. Additionally, the evaluation/RQ4/details directory contains detailed metrics for each tool corresponding to each CWE category.
For more information about the dataset and research findings, please refer to our paper accepted by ICSE 2026:
@misc{chen2025forgellmdrivenframeworklargescale,
title={FORGE: An LLM-driven Framework for Large-Scale Smart Contract Vulnerability Dataset Construction},
author={Jiachi Chen and Yiming Shen and Jiashuo Zhang and Zihao Li and John Grundy and Zhenzhe Shao and Yanlin Wang and Jiashui Wang and Ting Chen and Zibin Zheng},
year={2025},
eprint={2506.18795},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2506.18795},
}If you find any issues with the dataset or have questions, please contact shenym7@mail2.sysu.edu.cn or submit an issue to describe the problem. We will respond promptly and work to resolve it. You can also contribute to improving our code by creating a new pull request.


