Skip to content

Add two papers in Arxiv2025 #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ This category focuses on typical tasks in Software Engineering (SE) and Programm
- [Code Generation](data/papers/labels/code_generation.md) (270)
- [Program Synthesis](data/papers/labels/program_synthesis.md) (106)
- [Code Completion](data/papers/labels/code_completion.md) (25)
- [Program Repair](data/papers/labels/program_repair.md) (67)
- [Program Repair](data/papers/labels/program_repair.md) (68)
- [Program Transformation](data/papers/labels/program_transformation.md) (42)
- [Program Testing](data/papers/labels/program_testing.md) (97)
- [General Testing](data/papers/labels/general_testing.md) (7)
Expand All @@ -93,7 +93,7 @@ This category focuses on typical tasks in Software Engineering (SE) and Programm
- [Differential Testing](data/papers/labels/differential_testing.md) (6)
- [Debugging](data/papers/labels/debugging.md) (16)
- [Bug Reproduction](data/papers/labels/bug_reproduction.md) (6)
- [Vulnerability Exploitation](data/papers/labels/vulnerability_exploitation.md) (11)
- [Vulnerability Exploitation](data/papers/labels/vulnerability_exploitation.md) (12)
- [Static Analysis](data/papers/labels/static_analysis.md) (204)
- [Syntactic Analysis](data/papers/labels/syntactic_analysis.md) (1)
- [Pointer Analysis](data/papers/labels/pointer_analysis.md) (3)
Expand Down
6 changes: 6 additions & 0 deletions data/papers/labels/code_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -1020,6 +1020,12 @@
- **Labels**: [code generation](code_generation.md), [program repair](program_repair.md)


- [LLM-Based Repair of Static Nullability Errors](../venues/arXiv2025/paper_26.md), ([arXiv2025](../venues/arXiv2025/README.md))

- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
- **Labels**: [program repair](program_repair.md)


- [LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks](../venues/S&P2024/paper_1.md), ([S&P2024](../venues/S&P2024/README.md))

- **Abstract**: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigati...
Expand Down
6 changes: 6 additions & 0 deletions data/papers/labels/program_repair.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,12 @@
- **Labels**: [code generation](code_generation.md), [program repair](program_repair.md)


- [LLM-Based Repair of Static Nullability Errors](../venues/arXiv2025/paper_26.md), ([arXiv2025](../venues/arXiv2025/README.md))

- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
- **Labels**: [program repair](program_repair.md)


- [LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks](../venues/S&P2024/paper_1.md), ([S&P2024](../venues/S&P2024/README.md))

- **Abstract**: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigati...
Expand Down
6 changes: 6 additions & 0 deletions data/papers/labels/program_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -620,6 +620,12 @@
- **Labels**: [program testing](program_testing.md), [vulnerability exploitation](vulnerability_exploitation.md), [benchmark](benchmark.md)


- [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](../venues/arXiv2025/paper_27.md), ([arXiv2025](../venues/arXiv2025/README.md))

- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
- **Labels**: [vulnerability exploitation](vulnerability_exploitation.md)


- [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](../venues/arXiv2024/paper_30.md), ([arXiv2024](../venues/arXiv2024/README.md))

- **Abstract**: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior ...
Expand Down
6 changes: 6 additions & 0 deletions data/papers/labels/vulnerability_exploitation.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@
- **Labels**: [program testing](program_testing.md), [vulnerability exploitation](vulnerability_exploitation.md), [benchmark](benchmark.md)


- [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](../venues/arXiv2025/paper_27.md), ([arXiv2025](../venues/arXiv2025/README.md))

- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
- **Labels**: [vulnerability exploitation](vulnerability_exploitation.md)


- [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](../venues/arXiv2024/paper_30.md), ([arXiv2024](../venues/arXiv2024/README.md))

- **Abstract**: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior ...
Expand Down
16 changes: 15 additions & 1 deletion data/papers/venues/arXiv2025/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# arXiv2025

Number of papers: 25
Number of papers: 27

## [AI Software Engineer: Programming with Trust](paper_21.md)
- **Authors**: Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, Baishakhi Ray
Expand Down Expand Up @@ -128,6 +128,13 @@ Number of papers: 25
- **Labels**: [static analysis](../../labels/static_analysis.md), [bug detection](../../labels/bug_detection.md)


## [LLM-Based Repair of Static Nullability Errors](paper_26.md)
- **Authors**: Karimipour, Nima and Pradel, Michael and Kellogg, Martin and Sridharan, Manu
- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
- **Link**: [Read Paper](https://arxiv.org/pdf/2507.20674)
- **Labels**: [program repair](../../labels/program_repair.md)


## [Language Models for Code Optimization: Survey, Challenges and Future Directions](paper_5.md)
- **Authors**: Jingzhi Gong, Vardan Voskanyan, Paul Brookes, Fan Wu, Wei Jie, Jie Xu, Rafail Giavrimis, Mike Basios, Leslie Kanthan, Zheng Wang
- **Abstract**: Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks such as code generation, completion, and repair. This has paved the way for the emergence of LM-based code optimization techniques, which are crucial for enhancing the performance of existing programs, such as accelerating program execution time. However, a comprehensive survey dedicated to this specific application has been lacking. To fill this gap, w...
Expand All @@ -149,6 +156,13 @@ Number of papers: 25
- **Labels**: [static analysis](../../labels/static_analysis.md), [bug detection](../../labels/bug_detection.md)


## [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](paper_27.md)
- **Authors**: Simsek, Deniz and Eghbali, Aryaz and Pradel, Michael
- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
- **Link**: [Read Paper](https://arxiv.org/pdf/2506.04962)
- **Labels**: [vulnerability exploitation](../../labels/vulnerability_exploitation.md)


## [Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study](paper_1.md)
- **Authors**: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
- **Abstract**: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompas...
Expand Down
11 changes: 11 additions & 0 deletions data/papers/venues/arXiv2025/paper_26.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# LLM-Based Repair of Static Nullability Errors

**Authors**: Karimipour, Nima and Pradel, Michael and Kellogg, Martin and Sridharan, Manu

**Abstract**:

Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. Resolving a nullability error demands a deep understanding of how a symbol is used across the codebase, often spanning methods, classes, and packages. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves an average of 72% of the errors that remain after applying a state-of-the-art annotation inference technique. Unlike a naively-prompted LLM, NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.

**Link**: [Read Paper](https://arxiv.org/pdf/2507.20674)

**Labels**: [program repair](../../labels/program_repair.md)
11 changes: 11 additions & 0 deletions data/papers/venues/arXiv2025/paper_27.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages

**Authors**: Simsek, Deniz and Eghbali, Aryaz and Pradel, Michael

**Abstract**:

Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are informal and often incomplete, and because it requires a detailed understanding of how inputs passed to potentially vulnerable APIs may reach security-relevant sinks. In this paper, we present PoCGen, a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages. This is the first fully autonomous approach to use large language models (LLMs) in tandem with static and dynamic analysis techniques for PoC exploit generation. PoCGen leverages an LLM for understanding vulnerability reports, for generating candidate PoC exploits, and for validating and refining them. Our approach successfully generates exploits for 77% of the vulnerabilities in the SecBench.js dataset and 39% in a new, more challenging dataset of 794 recent vulnerabilities. This success rate significantly outperforms a recent baseline (by 45 absolute percentage points), while imposing an average cost of $0.02 per generated exploit.

**Link**: [Read Paper](https://arxiv.org/pdf/2506.04962)

**Labels**: [vulnerability exploitation](../../labels/vulnerability_exploitation.md)