Skip to content

FORGE Curated: A Curated EVM Smart Contracts Vulnerability Dataset

License

Notifications You must be signed in to change notification settings

shenyimings/FORGE-Curated

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FORGE Curated: A Curated EVM Smart Contracts Vulnerability Dataset

Paper Dataset Visitors

FORGE Curated is a high-quality subset of the FORGE dataset, specifically designed to support advanced research in smart contract security, including AI-based auditing, vulnerability analysis, etc.

Building upon feedback from users of the original FORGE dataset and fulfilling our commitment to responsible maintenance outlined in our ICSE'26 paper, we have compiled a new collection of audit reports. This dataset includes reports published between December 2024 and February 2026 by 11 top-tier audit teams.

Using the FORGE framework, we extracted and classified vulnerability data from these reports, and then organized them into the dataset-curated directory. We are conducting manual verification process to ensure accurate mapping between vulnerability findings and specific code locations.

We plan to continuously maintain and update this directory to support the community's research and development efforts.

Structure

The repository is organized as follows:

FORGE-Curated/
├── dataset-curated/            # Core curated dataset
│   ├── contracts/              # Source code with only *.sol files
│   ├── contracts-raw/          # Raw project source code with .git information
│   ├── findings/               # Extracted vulnerability findings (JSON)
│   ├── findings-without-source/# Findings where source code could not be resolved
│   └── reports/                # Original PDF audit reports
├── flatten/                    # Flattened datasets with findings and source code in single .JSON files
│   ├── vfp/                    # Vulnerability-File Pairs (All)
│   └── vfp-vuln/               # Vulnerability-File Pairs (Higher severity only)
├── LICENSE
├── models/                     # Data models and definitions
│   ├── cwe_dict.json           # CWE dictionary
│   └── schemas.py              # Pydantic data schemas
├── README.md
└── scripts/                    # Utility scripts
    ├── process.ipynb           # Data processing
    └── statistic.ipynb         # Statistical analysis

Examples

Vulnerability Finding (JSON)

{
    "path": "dataset-curated/reports/TrailofBits/2024-12-balancer-v3-securityreview.pdf",
    "project_info": {
        "url": ["https://github.com/balancer/balancer-v3-monorepo"],
        "commit_id": ["a24ebf0141e9350a42639d8593c1436241deae59"],
        "audit_date": "2024-08-05",
        "chain": "ethereum"
    },
    "findings": [
        {
            "id": 0,
            "category": {"1": ["CWE-284"], "2": ["CWE-285"], "3": ["CWE-862"]},
            "title": "Lack of approval reset on buffer allows anyone to drain the Vault",
            "description": "The lack of approval reset after the call to deposit allows a malicious wrapper contract to steal the Vault's funds...",
            "severity": "High",
            "location": [
                "Vault.sol::deposit#1169-1172",
                "Vault.sol::erc4626BufferWrapOrUnwrap"
            ],
            "files": [
                "balancer-v3-monorepo/pkg/vault/contracts/Vault.sol"
            ]
        },
        ......
    ]
}

Vulnerability-File Pair (VFP)

{
    "vfp_id": "vfp_00016",
    "project_name": "cantina_uniswap_april2025.pdf",
    "findings": [
        {
            "id": 0,
            "category": {"1": ["CWE-284"], "2": ["CWE-285"], "3": ["CWE-863"]},
            "title": "Execute calls can be front-run",
            "description": "The `execute` function in the MinimalDelegation contract is publicly callable...",
            "severity": "High",
            "location": ["MinimalDelegation.sol::execute#66"],
            "files": ["minimal-delegation/src/MinimalDelegation.sol"]
        },
        ......
    ],
    "affected_files": {
        "MinimalDelegation.sol": "// SPDX-License-Identifier: UNLICENSED\npragma solidity ^0.8.29;\n\nimport {EnumerableSetLib}...",
        ......
    }
}

Statistics

General Overview

Metric Value
Total Audit Reports Processed 323
Reports with Accessible Source Code 208
Total Projects 252
Total Findings 2,469
Total Solidity Files 29,221
Total Lines of Code (LoC) 4,762,386
Avg. LoC per Project ~18,898
Avg. Files per Project ~116
Solidity Version Distribution ---
^0.8 198
^0.7 2
^0.6 5
^0.5 1
^0.4 2

Findings by Severity

Severity Level Count
Critical 67
High 244
Medium 430
Low 772
Informational 879
N/A 77

Dataset Composition

Metric Value
Total Vulnerability-File Pairs (VFPs) 627
High-Impact VFPs (Medium/High/Critical) 304

Note

Many to-many relationship between findings and files, so the number of VFPs is less than total findings.

Data Schema

The data follows a strict schema defined in models/schemas.py. Below is the standard definition for the core objects.

Finding from Audit Reports

JSON files in dataset-curated/findings and dataset-curated/findings-without-source directories follow this structure:

Field Type Description
path str Path to the original audit report PDF.
project_info ProjectInfo Metadata regarding the audited project.
findings List[Finding] List of vulnerability findings in the report.

ProjectInfo: Metadata regarding the audited project.

Field Type Description
url Union[str, List] URL(s) to the project repository.
commit_id Union[str, List] The specific commit hash audited.
chain str The blockchain network (e.g., Ethereum).
audit_date str Date of the audit report.
project_path Dict Mapping of project names to local storage paths.

Finding: Represents a single vulnerability found in an audit report.

Field Type Description
id Union[str, int] Unique identifier for the finding within the report.
category Dict Mapping of the vulnerability to CWE categories following a tree structure (e.g., {"1": ["CWE-284"]}).
title str The title of the finding as stated in the report.
description str Detailed description of the vulnerability.
severity Union[str, List] Severity level (e.g., High, Medium, Low, Critical).
location Union[str, List] Precise location in the code extracted by LLM, usually following a format like filename.sol::function#StartLine-EndLine.
files List[str] List of files affected by this finding.

Vulnerability-File Pair (VFP)

JSON files in the flatten/vfp and flatten/vfp-vuln directories follow this structure:

Field Type Description
vfp_id str Unique ID for the pair (e.g., vfp_00016).
project_name str Name of the source audit report/project.
findings List[Finding] List of findings contained in this VFP.
affected_files Dict[str, str] Dictionary where Key is filename and Value is the full source code string.
View Pydantic Data Model Code
@dataclass
class ProjectInfo:
    url: Union[str, int, List, None] = "n/a"
    commit_id: Union[str, int, List, None] = "n/a"
    address: Union[str, List, None] = field(default_factory=lambda: "n/a")
    chain: Union[str, int, List, None] = "n/a"
    compiler_version: Union[str, List, None] = "n/a"
    audit_date: str = "n/a"
    project_path: Union[str, Dict, None] = "n/a"

@dataclass
class Finding:
    id: Union[str, int] = 0
    category: Dict = field(default_factory=dict)
    title: str = ""
    description: str = ""
    severity: Union[str, List, None] = field(default_factory=lambda: "")
    location: Union[str, List, None] = field(default_factory=lambda: "")
    files: Union[str, List, None] = field(default_factory=list)

class Report(BaseModel):
    path: str = ""
    project_info: ProjectInfo = field(default_factory=ProjectInfo)
    findings: List[Finding] = field(default_factory=list)

class VulnerabilityFilePair(BaseModel):
    vfp_id: str = "" 
    project_name: str = ""
    findings: List[Finding] = Field(default_factory=list)
    affected_files: Dict[str, str] = Field(default_factory=dict)

Important Notes

  • Commit Checkout: The submodules in this repository are not automatically checked out to the audited commit. To work with the specific version of the code that was audited, you must manually (or use the Git python module) checkout the commit_id provided in the project's metadata.
  • For some projects, the commit ID referenced in the audit report is no longer part of the main repository tree. While these commits are still accessible on GitHub, they have been manually downloaded for this dataset and therefore do not contain .git metadata.
  • In rare cases where the exact commit ID from the audit was deleted or unavailable, we have selected the nearest available commit preceding it.

Important

  • Disclaimer: All data is collected from public sources. The inclusion of an audit team in this list is based on preliminary collection and does not constitute a ranking of audit quality. Furthermore, it does not guarantee that the projects are free of bugs. We plan to gradually include more audit teams and encourage community-driven contributions.

FAQ

Q: How does FORGE Curated differ from the original FORGE dataset?

A: FORGE Curated focuses on new, high-quality audit reports published between Dec 2024 and Feb 2026 from 11 premium audit teams. We are applying manual verification (ongoing) to match vulnerabilities with their exact file locations more accurately. Additionally, to facilitate LLM training and evaluation, we provide a flattened VFP dataset. This constructs "Vulnerability-File Pairs" where the .sol source code and the vulnerability description coexist in a single .json file, simplifying data loading.

Q: Why use CWE for vulnerability classification?

A: The Web3 ecosystem lacks a comprehensive, scientifically hierarchical vulnerability taxonomy. Existing standards like SWC, DASP10, or OWASP SCWE are often outdated, not comprehensive enough, or lack widespread adoption. We utilize CWE (Common Weakness Enumeration) because it is a globally recognized software vulnerability classification system. It provides a unified standard for description and classification, facilitating comparison between different tools and research. Using CWE enhances the dataset's usability and universality.

Q: How can I use the FORGE Curated dataset?

A: The dataset is suitable for various scenarios:

  • Benchmarking & Training: For AI-based smart contract auditing systems.
  • Tool Evaluation: Assessing SAST/DAST vulnerability analysis tools.
  • Education: Learning and practicing Web3 security.
  • Ecosystem Analysis: Analyzing smart contract security trends from late 2024 to early 2026.

Refer to the scripts/ directory for examples on how to load and process the data.

Tip

Previous evaluation datasets may suffer from data leakage issues, meaning LLMs have already been trained on similar datasets. Therefore, these results could be distorted and cannot accurately reflect LLM's true capabilities. In contrast, our FORGE Curated dataset is sourced from new audit reports from the past year, making data leakage issues insignificant for LLMs released before 2026.

Q: How should I evaluate using CWE types?

A: Due to the large scale and complexity of the CWE hierarchy, we suggest different evaluation methods based on your goal:

  • For SAST/DAST Evaluation: Map the tool's output to CWE root causes manually (or via official guides). If the detected CWE exists in the Ground Truth CWE tree, count it as a True Positive (TP).
  • For LLM Evaluation:
    • Binary/Multi-class: Ask the LLM to identify if a specific CWE exists in the affected files or list all possible CWEs.
    • LLM-as-a-Judge: Given the rich context (Title/Description/Location) in our dataset, you can ignore strict CWE matching and use another LLM to judge if your system "caught the point."
  • Tools: We plan to develop automated evaluation tools. In the meantime, you can look at third-party alternatives like EVMbench by OpenAI, scabench, or auditagent-scoring-algo.

Q: How can I focus only on exploitable vulnerabilities (ignoring code quality/Gas optimization)?

A: You can filter by the Severity field or specific CWE types (e.g., CWE-710).

  • We provide a filtered example in the flatten/vfp-vuln directory, which retains only vulnerabilities with Medium severity and above(Medium, High, and Critical).

Note

While pillar categories like CWE-710 (Improper Adherence to Coding Standards) often contain non-exploitable issues (e.g., CWE-1041, CWE-1164), some sub-types like CWE-657 (Violation of Secure Design Principles) can still be high-severity. Always cross-reference with the finding title and description.

Q: Are there similar datasets for other ecosystems?

A: Yes, stay tuned!

Q: I want to contribute to FORGE Curated. How?

A: We welcome community contributions via Issues and PRs.

  • Fixes: If you find errors in classification or location, open an Issue/PR indicating the Finding ID and the correct information. We will verify and merge these fixes regularly.
  • New Reports: To submit new audit or bug bounty reports, please ensure the project code is open-source. Submit a PR adding the PDF to dataset-curated/reports/<AuditorName>/.
  • New Auditors: If you want us to track a specific audit team, please open an Issue with their name, website, and a link to their public reports.

About

FORGE Curated: A Curated EVM Smart Contracts Vulnerability Dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published