feat(benchmark): add support for evaluation on futurex#40
Merged
ntudy merged 4 commits intoMiroMindAI:miroflow-v0.3from Sep 18, 2025
Merged
feat(benchmark): add support for evaluation on futurex#40ntudy merged 4 commits intoMiroMindAI:miroflow-v0.3from
ntudy merged 4 commits intoMiroMindAI:miroflow-v0.3from
Conversation
BinWang28
reviewed
Sep 18, 2025
docs/mkdocs/docs/futurex.md
Outdated
| # For Linux sandbox (code execution environment) | ||
| E2B_API_KEY="xxx" | ||
|
|
||
| # We use Claude-3.5-Sonnet with OpenRouter backend to initialize the LLM |
Member
There was a problem hiding this comment.
This is a typo from old days, should be Claude 3.7
BinWang28
reviewed
Sep 18, 2025
| @@ -0,0 +1,258 @@ | |||
| # Futurex-Online | |||
Member
There was a problem hiding this comment.
mention in the documents that this is a quick start for running futurex benchmark and prepare results, not for fully reproduce the results that is submitted
ntudy
reviewed
Sep 18, 2025
docs/mkdocs/docs/futurex.md
Outdated
| After evaluation completion, extract the results using the provided utility: | ||
|
|
||
| ```bash title="Extract Results" | ||
| uv run utils/extract_futurex_results.py --log_dir logs/futurex/$(date +"%Y%m%d_%H%M") |
ntudy
reviewed
Sep 18, 2025
docs/mkdocs/docs/futurex.md
Outdated
| After evaluation completion, extract the results using the provided utility: | ||
|
|
||
| ```bash title="Extract Results" | ||
| uv run utils/extract_futurex_results.py --log_dir logs/futurex/$(date +"%Y%m%d_%H%M") |
Contributor
There was a problem hiding this comment.
error: unrecognized arguments: --log_dir
Contributor
Author
There was a problem hiding this comment.
error: unrecognized arguments: --log_dir
My bad, should be the following format
uv run utils/extract_futurex_results.py logs/futurex/$(date +"%Y%m%d_%H%M")
Contributor
Author
|
I have pushed a commit that should resolve the issues regarding arguments and documentation. |
ntudy
approved these changes
Sep 18, 2025
Zhudongsheng75
pushed a commit
to open-compass/MiroFlow
that referenced
this pull request
Dec 27, 2025
* upd: add futurex evaluation support. * upd: support multiple eval for futurex and add relavent doc. * upd: fix bugs with doc for futurex. * debug: fix wrong calling path.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe this PR
Overview
Integrates Futurex-Online prediction dataset into MiroFlow's benchmark system with majority voting for improved prediction accuracy (adapted from MiroThinker).
Key Changes
🆕 New Files
utils/prepare_benchmark/gen_futurex.py- Dataset generatorconfig/benchmark/futurex.yaml- Benchmark configurationscripts/run_evaluate_multiple_runs_futurex.sh- Multi-run evaluation scriptdocs/mkdocs/docs/futurex.md- Complete documentationdocs/mkdocs/mkdocs.yml- Add documentation link for futurexutils/extract_futurex_results.py- Extract results from logging and implement majority voting from MiroThinker for multiple runs.🔧 Modified Files
utils/prepare_benchmark/main.py- Added Futurex supportFeatures
Usage
Checklist for PR
Must Do
feat(agent): add pdf tool via mcp,perf: make llm client asyncandfix(utils): load custom config via importlibetc. CI jobcheck-pr-titleenforces Angular commit message format to PR title.make precommitlocally. CI joblintenforce ruff default format/lint rules on all new codes.make pytest. Check test summary (located atreport.html) and coverage report (located athtmlcov/index.html) on new codes.Nice To Have
/testsforfeatandtestPR./docsfordocsandciPR.