feat(evaluation): add majority and selector by ntudy · Pull Request #10 · MiroMindAI/MiroFlow

ntudy · 2025-08-14T08:15:54Z

Describe this PR

Checklist for PR

Must Do

Write a good PR title and description, i.e. feat(agent): add pdf tool via mcp, perf: make llm client async and fix(utils): load custom config via importlib etc. CI job check-pr-title enforces Angular commit message format to PR title.
Run make precommit locally. CI job lint enforce ruff default format/lint rules on all new codes.
Run make pytest. Check test summary (located at report.html) and coverage report (located at htmlcov/index.html) on new codes.

Nice To Have

(Optional) Write/update tests under /tests for feat and test PR.
(Optional) Write/update docs under /docs for docs and ci PR.

Copilot

Pull Request Overview

This PR adds evaluation functionality for analyzing multiple agent runs, including majority voting and LLM-based solution selection. The changes introduce two new analysis methods to help select the best solutions from multiple agent attempts.

Adds majority vote calculation for determining task success based on multiple runs
Implements LLM-powered solution selector to choose best answers from multiple attempts
Updates evaluation scripts to include the new analysis tools

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
apps/run-agent/scripts/claude-sonnet-3.7/run_evaluate_multiple_runs_gaia-validation.sh	Adds new evaluation steps for score calculation and solution selection
apps/run-agent/main.py	Registers the new llm-solution-selector command
apps/run-agent/llm_solution_selector.py	New module implementing LLM-based solution selection with async processing
apps/run-agent/calculate_score_from_log.py	Adds majority vote calculation and updates log file pattern matching

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

apps/run-agent/llm_solution_selector.py

apps/run-agent/calculate_score_from_log.py

feat(tool-blacklist): add tool blacklist function

Yue Deng added 3 commits August 14, 2025 14:05

add majority vote

50e4247

add select

9e737f2

add verbose

d0713c5

ntudy requested review from Copilot, gali-leilei and hellen-wong August 14, 2025 08:15

Copilot AI reviewed Aug 14, 2025

View reviewed changes

Yue Deng added 3 commits August 14, 2025 16:17

lint code

43c8745

update majority vote

8544f1a

add llm majority vote

2fdad47

ntudy closed this Aug 19, 2025

ntudy deleted the dev-vote-select branch August 27, 2025 05:55

BinWang28 pushed a commit that referenced this pull request Mar 11, 2026

Merge pull request #10 from MiroMindAI/blacklist-tool

e75d487

feat(tool-blacklist): add tool blacklist function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(evaluation): add majority and selector#10

feat(evaluation): add majority and selector#10
ntudy wants to merge 6 commits intomainfrom
dev-vote-select

ntudy commented Aug 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ntudy commented Aug 14, 2025

Describe this PR

Checklist for PR

Must Do

Nice To Have

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants