DeepSoftwareAnalytics
diff --git a/‎README.md‎
Lines changed: 12 additions & 0 deletions b/‎README.md‎
Lines changed: 12 additions & 0 deletions
diff --git a/‎data/papers_data_analysis.yaml‎
Lines changed: 37 additions & 37 deletions b/‎data/papers_data_analysis.yaml‎
Lines changed: 37 additions & 37 deletions
@@ -103,7 +103,9 @@ Based on a systematic review of **196 papers and online resources**, this survey
 
 *Benchmarks for evaluating issue resolution systems*
 
+- `(2026-03)` **BeyondSWE**: BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing? [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2603.03194) [![Website](https://img.shields.io/badge/Website-paper-5B9BD5?logo=googlechrome&logoColor=white)](https://aweai-team.github.io/BeyondSWE/) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/AweAI-Team/BeyondSWE) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/datasets/AweAI-Team/BeyondSWE)
 - `(2026-02)` **SWE Context Bench**: SWE Context Bench: A Benchmark for Context Learning in Coding [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/pdf/2602.08316)
+- `(2026-02)` **SWE-ABS**: SWE-ABS: Adversarial Benchmark Strengthening Exposes Inflated Success Rates on Test-based Benchmark [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2603.00520)
 - `(2025-12)` **SWE-InfraBench**: SWE-InfraBench: Evaluating Language Models on Cloud Infrastructure Code [![OpenReview](https://img.shields.io/badge/OpenReview-paper-8C1B13?logo=openreview&logoColor=white)](https://openreview.net/forum?id=XX0ciUwfXa)
 - `(2025-12)` **SWE-EVO**: SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.18470)
 - `(2025-11)` **SWE-Sharp-Bench**: SWE-Sharp-Bench: A Reproducible Benchmark for C# Software Engineering Tasks [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2511.02352)
@@ -130,6 +132,9 @@ Based on a systematic review of **196 papers and online resources**, this survey
 *Datasets for training issue resolution agents*
 
 - `(2026-02)` **SWE-Universe**: SWE-Universe: Scale Real-World Verifiable Environments to Millions [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://www.arxiv.org/abs/2602.02361)
+- `(2026-02)` **SWE-rebench V2**: SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.23866)
+- `(2026-02)` **Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.09892) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/AweAI-Team/ScaleSWE) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/collections/AweAI-Team/scale-swe)
+- `(2026-01)` **daVinci-Dev**: daVinci-Dev: Agent-native Mid-training for Software Engineering [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.18418) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/GAIR-NLP/daVinci-Dev) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/datasets/GAIR/daVinci-Dev)
 - `(2025-06)` **Skywork-SWE**: Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2506.19290)
 - `(2025-05)` **SWELoc**: SweRank: Software Issue Localization with Code Ranking [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.07849)
 - `(2025-04)` **Multi-SWE-RL**: Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2504.02605v1) [![OpenReview](https://img.shields.io/badge/OpenReview-paper-8C1B13?logo=openreview&logoColor=white)](https://openreview.net/forum?id=MhBZzkz4h9)
@@ -157,6 +162,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
 
 *Collaborative multi-agent frameworks*
 
+- `(2026-03)` **SWE-Adept**: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2603.01327)
 - `(2025-08)` **Meta-RAG**: Meta-RAG on Large Codebases Using Code Summarization [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.02611)
 - `(2025-07)` **SWE-Debate**: SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2507.23348v1)
 - `(2025-06)` **AgentScope**: SWE-Bench - AgentScope [![Website](https://img.shields.io/badge/Website-paper-5B9BD5?logo=googlechrome&logoColor=white)](https://doc.agentscope.io/v0/en/tutorial/swe.html)
@@ -187,6 +193,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
 
 *Methods leveraging external tools*
 
+- `(2026-03)` **SWE-Adept**: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2603.01327)
 - `(2026-02)` **Closing the Loop**: Closing the Loop: Universal Repository Representation with RPG-Encoder [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.02084) [![Website](https://img.shields.io/badge/Website-paper-5B9BD5?logo=googlechrome&logoColor=white)](https://ayanami2003.github.io/RPG-Encoder/) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/microsoft/RPG-ZeroRepo)
 - `(2026-01)` **SWE-Tester**: SWE-Tester: Training Open-Source LLMs for Issue Reproduction in Real-World Repositories [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.13713)
 - `(2025-12)` **GraphLocator**: GraphLocator: Graph-guided Causal Reasoning for Issue Localization [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.22469)
@@ -235,6 +242,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
 
 *Models trained via supervised learning*
 
+- `(2026-02)` **Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.09892) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/AweAI-Team/ScaleSWE) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/collections/AweAI-Team/scale-swe)
 - `(2026-01)` **SWE-Lego**: SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.01426)
 - `(2026-01)` **SWE-Replay**: SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.22129)
 - `(2025-12)` **SWE-Compressor**: Context as a Tool: Context Management for Long-Horizon SWE-Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.22087)
@@ -258,6 +266,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
 - `(2026-02)` **SWE-Protégé**: SWE-Protégé: Learning to Selectively Collaborate With an Expert Unlocks Small Language Models as Software Engineering Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.22124)
 - `(2026-02)` **SWE-MiniSandbox**: SWE-MiniSandbox: Container-Free Reinforcement Learning for Building Software Engineering Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.11210v1) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](http://github.com/lblankl/SWE-MiniSandbox)
 - `(2026-01)` **MiMo-V2-Flash**: MiMo-V2-Flash Technical Report [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.02780)
+- `(2026-01)` **SWE-Manager**: SWE-Manager: Selecting and Synthesizing Golden Proposals Before Coding [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.22956) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/shuaijiumei/SWE-Manager)
 - `(2025-12)` **Self-play SWE-RL**: Toward Training Superintelligent Software Agents through Self-Play SWE-RL [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.18552)
 - `(2025-12)` **SWE-Playground**: Training Versatile Coding Agents in Synthetic Environments [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.12216)
 - `(2025-12)` **SWE-RM**: SWE-RM: Execution-free Feedback For Software Engineering Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.21919)
@@ -308,6 +317,8 @@ Based on a systematic review of **196 papers and online resources**, this survey
 *Techniques for collecting training data*
 
 - `(2026-02)` **DockSmith**: DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.00592) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/collections/8sj7df9k8m5x8/docksmith)
+- `(2026-02)` **SWE-rebench V2**: SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.23866)
+- `(2026-02)` **Scale-SWE**: Immersion in the GitHub Universe: Scaling Coding Agents to Mastery [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.09892) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/AweAI-Team/ScaleSWE) [![HuggingFace](https://img.shields.io/badge/HuggingFace-dataset-ff7e21?logo=huggingface&logoColor=white)](https://huggingface.co/collections/AweAI-Team/scale-swe)
 - `(2026-01)` **MEnvAgent**: MEnvAgent: Scalable Polyglot Environment Construction for Verifiable Software Engineering [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2601.22859) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/ernie-research/MEnvAgent)
 - `(2025-12)` **Multi-Docker-Eval**: Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2512.06915)
 - `(2025-08)` **RepoForge**: RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2508.01550)
@@ -321,6 +332,7 @@ Based on a systematic review of **196 papers and online resources**, this survey
 *Approaches for synthetic data generation*
 
 - `(2026-02)` **SWE-World**: SWE-World: Building Software Engineering Agents in Docker-Free Environments [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2602.03419) [![GitHub](https://img.shields.io/badge/GitHub-repo-24292F?logo=github&logoColor=white)](https://github.com/RUCAIBox/SWE-World)
+- `(2026-02)` **SWE-Hub**: SWE-Hub: A Unified Production System for Scalable, Executable Software Engineering Tasks [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2603.00575)
 - `(2025-09)` **SWE-Mirror**: SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2509.08724)
 - `(2025-06)` **SWE-Flow**: Synthesizing Software Engineering Data in a Test-Driven Manner [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2506.09003v2) [![OpenReview](https://img.shields.io/badge/OpenReview-paper-8C1B13?logo=openreview&logoColor=white)](https://openreview.net/forum?id=P9DQ2IExgS)
 - `(2025-04)` **R2E-Gym**: R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents [![arXiv](https://img.shields.io/badge/arXiv-paper-B31B1B?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2504.07164) [![OpenReview](https://img.shields.io/badge/OpenReview-paper-8C1B13?logo=openreview&logoColor=white)](https://openreview.net/forum?id=7evvwwdo3z)
 
@@ -1,19 +1,30 @@
-- short_name: SWE-bench Verified
-  title: Introducing SWE-bench Verified | OpenAI
-  authors: OpenAI
-  year: '2024'
-  venue: '-'
-  month: 2024-08
+- short_name: Data contamination
+  title: Does SWE-Bench-Verified Test Agent Ability or Model Memory?
+  authors: Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan
+  year: '2025'
+  venue: arXiv preprint arXiv:2512.10218
+  month: 2025-12
   links:
-    website: https://openai.com/index/introducing-swe-bench-verified/
-- short_name: Patch Correctness
-  title: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study
-  authors: You Wang, Michael Pradel, Zhongxin Liu
+    arxiv: https://arxiv.org/abs/2512.10218
+- short_name: Rigorous agentic benchmarks
+  title: Establishing Best Practices for Building Rigorous Agentic Benchmarks
+  authors: Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha
+    Cui, Sayash Kapoor et al.
   year: '2025'
-  venue: arXiv preprint arXiv:2503.15223
-  month: 2025-03
+  venue: arXiv preprint arXiv:2507.02825
+  month: 2025-07
   links:
-    arxiv: https://arxiv.org/abs/2503.15223
+    arxiv: https://arxiv.org/abs/2507.02825
+- short_name: SPICE
+  title: "SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity,\n   \
+    \            Test Coverage, and Effort Estimation"
+  authors: Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang,
+    Yihao Chen, Zhilong Chen, Arthur Leung et al.
+  year: '2025'
+  venue: ASE 2025
+  month: 2025-07
+  links:
+    arxiv: https://arxiv.org/abs/2507.09108v5
 - short_name: UTBoost
   title: 'UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench'
   authors: Boxi Yu, Yuxuan Zhu, Pinjia He, Daniel Kang
@@ -30,15 +41,6 @@
   month: 2025-06
   links:
     arxiv: https://arxiv.org/abs/2506.17812
-- short_name: Rigorous agentic benchmarks
-  title: Establishing Best Practices for Building Rigorous Agentic Benchmarks
-  authors: Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun, Andy Zhang, Shu Liu, Sasha
-    Cui, Sayash Kapoor et al.
-  year: '2025'
-  venue: arXiv preprint arXiv:2507.02825
-  month: 2025-07
-  links:
-    arxiv: https://arxiv.org/abs/2507.02825
 - short_name: The SWE-Bench Illusion
   title: 'The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason'
   authors: Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam
@@ -57,21 +59,19 @@
   month: 2025-04
   links:
     doi: http://dx.doi.org/10.1109/ICSE-Companion66252.2025.00075
-- short_name: SPICE
-  title: "SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity,\n   \
-    \            Test Coverage, and Effort Estimation"
-  authors: Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang,
-    Yihao Chen, Zhilong Chen, Arthur Leung et al.
+- short_name: Patch Correctness
+  title: Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study
+  authors: You Wang, Michael Pradel, Zhongxin Liu
   year: '2025'
-  venue: ASE 2025
-  month: 2025-07
+  venue: arXiv preprint arXiv:2503.15223
+  month: 2025-03
   links:
-    arxiv: https://arxiv.org/abs/2507.09108v5
-- short_name: Data contamination
-  title: Does SWE-Bench-Verified Test Agent Ability or Model Memory?
-  authors: Thanosan Prathifkumar, Noble Saji Mathews, Meiyappan Nagappan
-  year: '2025'
-  venue: arXiv preprint arXiv:2512.10218
-  month: 2025-12
+    arxiv: https://arxiv.org/abs/2503.15223
+- short_name: SWE-bench Verified
+  title: Introducing SWE-bench Verified | OpenAI
+  authors: OpenAI
+  year: '2024'
+  venue: '-'
+  month: 2024-08
   links:
-    arxiv: https://arxiv.org/abs/2512.10218
+    website: https://openai.com/index/introducing-swe-bench-verified/