Skip to content

Integrate SWE-bench evaluation for RPG-Encoder #25

@amondnet

Description

@amondnet

Summary

Integrate SWE-bench evaluation to measure RPG-Encoder performance on real-world software engineering tasks.

Paper: RPG-Encoder §5 (arXiv:2602.02084)

Tasks

  • Set up SWE-bench evaluation harness
  • Integrate RPG tools (SearchNode, FetchNode, ExploreRPG) as agent tools
  • Run evaluations and compare against paper results
  • Report generation

Acceptance Criteria

  • End-to-end SWE-bench evaluation pipeline works
  • Results are comparable with paper baselines (78% with Gemini Flash)

Related Files

  • src/tools/search.ts, src/tools/fetch.ts, src/tools/explore.ts

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions