You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -103,7 +103,9 @@ Based on a systematic review of **196 papers and online resources**, this survey
103
103
104
104
*Benchmarks for evaluating issue resolution systems*
105
105
106
+
-`(2026-03)`**BeyondSWE**: BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? [](https://arxiv.org/abs/2603.03194)[](https://aweai-team.github.io/BeyondSWE/)[](https://github.com/AweAI-Team/BeyondSWE)[](https://huggingface.co/datasets/AweAI-Team/BeyondSWE)
106
107
-`(2026-02)`**SWE Context Bench**: SWE Context Bench: A Benchmark for Context Learning in Coding [](https://arxiv.org/pdf/2602.08316)
-`(2025-12)`**SWE-InfraBench**: SWE-InfraBench: Evaluating Language Models on Cloud Infrastructure Code [](https://openreview.net/forum?id=XX0ciUwfXa)
-`(2025-11)`**SWE-Sharp-Bench**: SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks [](https://arxiv.org/abs/2511.02352)
@@ -130,6 +132,9 @@ Based on a systematic review of **196 papers and online resources**, this survey
130
132
*Datasets for training issue resolution agents*
131
133
132
134
-`(2026-02)`**SWE-Universe**: SWE-Universe: Scale Real-World Verifiable Environments to Millions [](https://www.arxiv.org/abs/2602.02361)
135
+
-`(2026-02)`**SWE-rebench V2**: SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale [](https://arxiv.org/abs/2602.23866)
136
+
-`(2026-02)`**Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [](https://arxiv.org/abs/2602.09892)[](https://github.com/AweAI-Team/ScaleSWE)[](https://huggingface.co/collections/AweAI-Team/scale-swe)
137
+
-`(2026-01)`**daVinci-Dev**: daVinci-Dev: Agent-native Mid-training for Software Engineering [](https://arxiv.org/abs/2601.18418)[](https://github.com/GAIR-NLP/daVinci-Dev)[](https://huggingface.co/datasets/GAIR/daVinci-Dev)
133
138
-`(2025-06)`**Skywork-SWE**: Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs [](https://arxiv.org/abs/2506.19290)
134
139
-`(2025-05)`**SWELoc**: SweRank: Software Issue Localization with Code Ranking [](https://arxiv.org/abs/2505.07849)
135
140
-`(2025-04)`**Multi-SWE-RL**: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving [](https://arxiv.org/abs/2504.02605v1)[](https://openreview.net/forum?id=MhBZzkz4h9)
@@ -157,6 +162,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
157
162
158
163
*Collaborative multi-agent frameworks*
159
164
165
+
-`(2026-03)`**SWE-Adept**: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution [](https://arxiv.org/abs/2603.01327)
160
166
-`(2025-08)`**Meta-RAG**: Meta-RAG on Large Codebases Using Code Summarization [](https://arxiv.org/abs/2508.02611)
161
167
-`(2025-07)`**SWE-Debate**: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution [](https://arxiv.org/abs/2507.23348v1)
@@ -187,6 +193,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
187
193
188
194
*Methods leveraging external tools*
189
195
196
+
-`(2026-03)`**SWE-Adept**: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution [](https://arxiv.org/abs/2603.01327)
190
197
-`(2026-02)`**Closing the Loop**: Closing the Loop: Universal Repository Representation with RPG-Encoder [](https://arxiv.org/abs/2602.02084)[](https://ayanami2003.github.io/RPG-Encoder/)[](https://github.com/microsoft/RPG-ZeroRepo)
191
198
-`(2026-01)`**SWE-Tester**: SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories [](https://arxiv.org/abs/2601.13713)
192
199
-`(2025-12)`**GraphLocator**: GraphLocator: Graph-guided Causal Reasoning for Issue Localization [](https://arxiv.org/abs/2512.22469)
@@ -235,6 +242,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
235
242
236
243
*Models trained via supervised learning*
237
244
245
+
-`(2026-02)`**Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [](https://arxiv.org/abs/2602.09892)[](https://github.com/AweAI-Team/ScaleSWE)[](https://huggingface.co/collections/AweAI-Team/scale-swe)
238
246
-`(2026-01)`**SWE-Lego**: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving [](https://arxiv.org/abs/2601.01426)
239
247
-`(2026-01)`**SWE-Replay**: SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents [](https://arxiv.org/abs/2601.22129)
240
248
-`(2025-12)`**SWE-Compressor**: Context as a Tool: Context Management for Long-Horizon SWE-Agents [](https://arxiv.org/abs/2512.22087)
@@ -258,6 +266,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
258
266
-`(2026-02)`**SWE-Protégé**: SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents [](https://arxiv.org/abs/2602.22124)
259
267
-`(2026-02)`**SWE-MiniSandbox**: SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents [](https://arxiv.org/abs/2602.11210v1)[](http://github.com/lblankl/SWE-MiniSandbox)
-`(2026-01)`**SWE-Manager**: SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding [](https://arxiv.org/abs/2601.22956)[](https://github.com/shuaijiumei/SWE-Manager)
261
270
-`(2025-12)`**Self-play SWE-RL**: Toward Training Superintelligent Software Agents through Self-Play SWE-RL [](https://arxiv.org/abs/2512.18552)
262
271
-`(2025-12)`**SWE-Playground**: Training Versatile Coding Agents in Synthetic Environments [](https://arxiv.org/abs/2512.12216)
263
272
-`(2025-12)`**SWE-RM**: SWE-RM: Execution-free Feedback For Software Engineering Agents [](https://arxiv.org/abs/2512.21919)
@@ -308,6 +317,8 @@ Based on a systematic review of **196 papers and online resources**, this survey
308
317
*Techniques for collecting training data*
309
318
310
319
-`(2026-02)`**DockSmith**: DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder [](https://arxiv.org/abs/2602.00592)[](https://huggingface.co/collections/8sj7df9k8m5x8/docksmith)
320
+
-`(2026-02)`**SWE-rebench V2**: SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale [](https://arxiv.org/abs/2602.23866)
321
+
-`(2026-02)`**Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [](https://arxiv.org/abs/2602.09892)[](https://github.com/AweAI-Team/ScaleSWE)[](https://huggingface.co/collections/AweAI-Team/scale-swe)
311
322
-`(2026-01)`**MEnvAgent**: MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [](https://arxiv.org/abs/2601.22859)[](https://github.com/ernie-research/MEnvAgent)
312
323
-`(2025-12)`**Multi-Docker-Eval**: Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering [](https://arxiv.org/abs/2512.06915)
313
324
-`(2025-08)`**RepoForge**: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale [](https://arxiv.org/abs/2508.01550)
@@ -321,6 +332,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
321
332
*Approaches for synthetic data generation*
322
333
323
334
-`(2026-02)`**SWE-World**: SWE-World: Building Software Engineering Agents in Docker-Free Environments [](https://arxiv.org/abs/2602.03419)[](https://github.com/RUCAIBox/SWE-World)
335
+
-`(2026-02)`**SWE-Hub**: SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks [](https://arxiv.org/abs/2603.00575)
324
336
-`(2025-09)`**SWE-Mirror**: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories [](https://arxiv.org/abs/2509.08724)
325
337
-`(2025-06)`**SWE-Flow**: Synthesizing Software Engineering Data in a Test-Driven Manner [](https://arxiv.org/abs/2506.09003v2)[](https://openreview.net/forum?id=P9DQ2IExgS)
326
338
-`(2025-04)`**R2E-Gym**: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents [](https://arxiv.org/abs/2504.07164)[](https://openreview.net/forum?id=7evvwwdo3z)
0 commit comments