Skip to content

Commit 01a2037

Browse files
committed
Add two papers in Arxiv2025:
"LLM-Based Repair of Static Nullability Errors" and "PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages"
1 parent 1f22514 commit 01a2037

File tree

8 files changed

+63
-3
lines changed

8 files changed

+63
-3
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ This category focuses on typical tasks in Software Engineering (SE) and Programm
7878
- [Code Generation](data/papers/labels/code_generation.md) (270)
7979
- [Program Synthesis](data/papers/labels/program_synthesis.md) (106)
8080
- [Code Completion](data/papers/labels/code_completion.md) (25)
81-
- [Program Repair](data/papers/labels/program_repair.md) (67)
81+
- [Program Repair](data/papers/labels/program_repair.md) (68)
8282
- [Program Transformation](data/papers/labels/program_transformation.md) (42)
8383
- [Program Testing](data/papers/labels/program_testing.md) (97)
8484
- [General Testing](data/papers/labels/general_testing.md) (7)
@@ -93,7 +93,7 @@ This category focuses on typical tasks in Software Engineering (SE) and Programm
9393
- [Differential Testing](data/papers/labels/differential_testing.md) (6)
9494
- [Debugging](data/papers/labels/debugging.md) (16)
9595
- [Bug Reproduction](data/papers/labels/bug_reproduction.md) (6)
96-
- [Vulnerability Exploitation](data/papers/labels/vulnerability_exploitation.md) (11)
96+
- [Vulnerability Exploitation](data/papers/labels/vulnerability_exploitation.md) (12)
9797
- [Static Analysis](data/papers/labels/static_analysis.md) (204)
9898
- [Syntactic Analysis](data/papers/labels/syntactic_analysis.md) (1)
9999
- [Pointer Analysis](data/papers/labels/pointer_analysis.md) (3)

data/papers/labels/code_generation.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,6 +1020,12 @@
10201020
- **Labels**: [code generation](code_generation.md), [program repair](program_repair.md)
10211021

10221022

1023+
- [LLM-Based Repair of Static Nullability Errors](../venues/arXiv2025/paper_26.md), ([arXiv2025](../venues/arXiv2025/README.md))
1024+
1025+
- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
1026+
- **Labels**: [program repair](program_repair.md)
1027+
1028+
10231029
- [LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks](../venues/S&P2024/paper_1.md), ([S&P2024](../venues/S&P2024/README.md))
10241030

10251031
- **Abstract**: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigati...

data/papers/labels/program_repair.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,12 @@
228228
- **Labels**: [code generation](code_generation.md), [program repair](program_repair.md)
229229

230230

231+
- [LLM-Based Repair of Static Nullability Errors](../venues/arXiv2025/paper_26.md), ([arXiv2025](../venues/arXiv2025/README.md))
232+
233+
- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
234+
- **Labels**: [program repair](program_repair.md)
235+
236+
231237
- [LLMs Cannot Reliably Identify and Reason About Security Vulnerabilities (Yet?): A Comprehensive Evaluation, Framework, and Benchmarks](../venues/S&P2024/paper_1.md), ([S&P2024](../venues/S&P2024/README.md))
232238

233239
- **Abstract**: Large Language Models (LLMs) have been suggested for use in automated vulnerability repair, but benchmarks showing they can consistently identify security-related bugs are lacking. We thus develop SecLLMHolmes, a fully automated evaluation framework that performs the most detailed investigation to date on whether LLMs can reliably identify and reason about security-related bugs. We construct a set of 228 code scenarios and analyze eight of the most capable LLMs across eight different investigati...

data/papers/labels/program_testing.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -620,6 +620,12 @@
620620
- **Labels**: [program testing](program_testing.md), [vulnerability exploitation](vulnerability_exploitation.md), [benchmark](benchmark.md)
621621

622622

623+
- [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](../venues/arXiv2025/paper_27.md), ([arXiv2025](../venues/arXiv2025/README.md))
624+
625+
- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
626+
- **Labels**: [vulnerability exploitation](vulnerability_exploitation.md)
627+
628+
623629
- [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](../venues/arXiv2024/paper_30.md), ([arXiv2024](../venues/arXiv2024/README.md))
624630

625631
- **Abstract**: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior ...

data/papers/labels/vulnerability_exploitation.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,12 @@
4848
- **Labels**: [program testing](program_testing.md), [vulnerability exploitation](vulnerability_exploitation.md), [benchmark](benchmark.md)
4949

5050

51+
- [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](../venues/arXiv2025/paper_27.md), ([arXiv2025](../venues/arXiv2025/README.md))
52+
53+
- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
54+
- **Labels**: [vulnerability exploitation](vulnerability_exploitation.md)
55+
56+
5157
- [Teams of LLM Agents can Exploit Zero-Day Vulnerabilities](../venues/arXiv2024/paper_30.md), ([arXiv2024](../venues/arXiv2024/README.md))
5258

5359
- **Abstract**: LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities). In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior ...

data/papers/venues/arXiv2025/README.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# arXiv2025
22

3-
Number of papers: 25
3+
Number of papers: 27
44

55
## [AI Software Engineer: Programming with Trust](paper_21.md)
66
- **Authors**: Abhik Roychoudhury, Corina Pasareanu, Michael Pradel, Baishakhi Ray
@@ -128,6 +128,13 @@ Number of papers: 25
128128
- **Labels**: [static analysis](../../labels/static_analysis.md), [bug detection](../../labels/bug_detection.md)
129129

130130

131+
## [LLM-Based Repair of Static Nullability Errors](paper_26.md)
132+
- **Authors**: Karimipour, Nima and Pradel, Michael and Kellogg, Martin and Sridharan, Manu
133+
- **Abstract**: Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-pr...
134+
- **Link**: [Read Paper](https://arxiv.org/pdf/2507.20674)
135+
- **Labels**: [program repair](../../labels/program_repair.md)
136+
137+
131138
## [Language Models for Code Optimization: Survey, Challenges and Future Directions](paper_5.md)
132139
- **Authors**: Jingzhi Gong, Vardan Voskanyan, Paul Brookes, Fan Wu, Wei Jie, Jie Xu, Rafail Giavrimis, Mike Basios, Leslie Kanthan, Zheng Wang
133140
- **Abstract**: Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks such as code generation, completion, and repair. This has paved the way for the emergence of LM-based code optimization techniques, which are crucial for enhancing the performance of existing programs, such as accelerating program execution time. However, a comprehensive survey dedicated to this specific application has been lacking. To fill this gap, w...
@@ -149,6 +156,13 @@ Number of papers: 25
149156
- **Labels**: [static analysis](../../labels/static_analysis.md), [bug detection](../../labels/bug_detection.md)
150157

151158

159+
## [PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages](paper_27.md)
160+
- **Authors**: Simsek, Deniz and Eghbali, Aryaz and Pradel, Michael
161+
- **Abstract**: Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are infor...
162+
- **Link**: [Read Paper](https://arxiv.org/pdf/2506.04962)
163+
- **Labels**: [vulnerability exploitation](../../labels/vulnerability_exploitation.md)
164+
165+
152166
## [Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study](paper_1.md)
153167
- **Authors**: Bowen Li, Wenhan Wu, Ziwei Tang, Lin Shi, John Yang, Jinyang Li, Shunyu Yao, Chen Qian, Binyuan Hui, Qicheng Zhang, Zhiyin Yu, He Du, Ping Yang, Dahua Lin, Chao Peng, Kai Chen
154168
- **Abstract**: Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation or repository issue debugging, falling short of measuring the full spectrum of challenges raised by real-world programming activities. In this case study, we explore the performance of LLMs across the entire software development lifecycle with DevEval, encompas...
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# LLM-Based Repair of Static Nullability Errors
2+
3+
**Authors**: Karimipour, Nima and Pradel, Michael and Kellogg, Martin and Sridharan, Manu
4+
5+
**Abstract**:
6+
7+
Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. Resolving a nullability error demands a deep understanding of how a symbol is used across the codebase, often spanning methods, classes, and packages. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves an average of 72% of the errors that remain after applying a state-of-the-art annotation inference technique. Unlike a naively-prompted LLM, NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.
8+
9+
**Link**: [Read Paper](https://arxiv.org/pdf/2507.20674)
10+
11+
**Labels**: [program repair](../../labels/program_repair.md)
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# PoCGen: Generating Proof-of-Concept Exploits for Vulnerabilities in Npm Packages
2+
3+
**Authors**: Simsek, Deniz and Eghbali, Aryaz and Pradel, Michael
4+
5+
**Abstract**:
6+
7+
Security vulnerabilities in software packages are a significant concern for developers and users alike. Patching these vulnerabilities in a timely manner is crucial to restoring the integrity and security of software systems. However, previous work has shown that vulnerability reports often lack proof-of-concept (PoC) exploits, which are essential for fixing the vulnerability, testing patches, and avoiding regressions. Creating a PoC exploit is challenging because vulnerability reports are informal and often incomplete, and because it requires a detailed understanding of how inputs passed to potentially vulnerable APIs may reach security-relevant sinks. In this paper, we present PoCGen, a novel approach to autonomously generate and validate PoC exploits for vulnerabilities in npm packages. This is the first fully autonomous approach to use large language models (LLMs) in tandem with static and dynamic analysis techniques for PoC exploit generation. PoCGen leverages an LLM for understanding vulnerability reports, for generating candidate PoC exploits, and for validating and refining them. Our approach successfully generates exploits for 77% of the vulnerabilities in the SecBench.js dataset and 39% in a new, more challenging dataset of 794 recent vulnerabilities. This success rate significantly outperforms a recent baseline (by 45 absolute percentage points), while imposing an average cost of $0.02 per generated exploit.
8+
9+
**Link**: [Read Paper](https://arxiv.org/pdf/2506.04962)
10+
11+
**Labels**: [vulnerability exploitation](../../labels/vulnerability_exploitation.md)

0 commit comments

Comments
 (0)