Update benchmark results to Search-R1 v0.1 #417

JiahangXu · 2025-12-16T09:58:37Z

No description provided.

Copilot

Pull request overview

This PR updates the Search-R1 documentation by replacing the placeholder "Evaluation" section with comprehensive benchmark results. The changes add concrete performance metrics comparing the original Search-R1 implementation against the Agent-Lightning version across multiple models and benchmarks.

Key Changes

Renamed section from "Evaluation" to "Benchmark Results"
Added description of seven diverse question-answering benchmarks (NQ, TriviaQA, PopQA, HotpotQA, 2WikiMultiHopQA, Musique, and Bamboogle)
Introduced performance comparison table showing results for Llama-3.2-3B, Qwen2.5-3B-Instruct, and Qwen2.5-7B-Instruct models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

contrib/recipes/search_r1/README.md

JiahangXu added 2 commits December 16, 2025 09:57

update benchmark results

344ba50

update qwen results

7a90005

JiahangXu marked this pull request as ready for review December 16, 2025 13:54

Copilot AI review requested due to automatic review settings December 16, 2025 13:54

Copilot started reviewing on behalf of JiahangXu December 16, 2025 14:03 View session

add highlight

ada8141

Copilot AI reviewed Dec 16, 2025

View reviewed changes

contrib/recipes/search_r1/README.md Outdated Show resolved Hide resolved

contrib/recipes/search_r1/README.md Show resolved Hide resolved

contrib/recipes/search_r1/README.md Outdated Show resolved Hide resolved

fix typo

88cb67f

ultmaster merged commit 52090e9 into main Dec 16, 2025
35 checks passed

JiahangXu deleted the dev/search_r1_benchmark branch December 17, 2025 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update benchmark results to Search-R1 v0.1 #417

Update benchmark results to Search-R1 v0.1 #417

Uh oh!

JiahangXu commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Update benchmark results to Search-R1 v0.1 #417

Update benchmark results to Search-R1 v0.1 #417

Uh oh!

Conversation

JiahangXu commented Dec 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants