-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementp3Priority 3 - LowPriority 3 - Lowstatus:in-progressCurrently being worked onCurrently being worked ontype:featureNew feature or requestNew feature or request
Description
Summary
Integrate SWE-bench evaluation to measure RPG-Encoder performance on real-world software engineering tasks.
Paper: RPG-Encoder §5 (arXiv:2602.02084)
Tasks
- Set up SWE-bench evaluation harness
- Integrate RPG tools (SearchNode, FetchNode, ExploreRPG) as agent tools
- Run evaluations and compare against paper results
- Report generation
Acceptance Criteria
- End-to-end SWE-bench evaluation pipeline works
- Results are comparable with paper baselines (78% with Gemini Flash)
Related Files
src/tools/search.ts,src/tools/fetch.ts,src/tools/explore.ts
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementp3Priority 3 - LowPriority 3 - Lowstatus:in-progressCurrently being worked onCurrently being worked ontype:featureNew feature or requestNew feature or request