Skip to content

Conversation

@github-actions
Copy link
Contributor

This PR adds benchmark results for the openai/gpt-5-codex model.

The following files have been updated:

  • src/benchmark/results.json - Raw benchmark results
  • src/benchmark/validation-results.json - Validation results against human baseline

This PR was automatically generated by the benchmark workflow.

Note: If you don't want to merge this PR, close it and the model will be added to the untested list to prevent re-processing.

@alrocar

@vercel
Copy link

vercel bot commented Sep 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
llm-benchmark Ready Ready Preview Comment Sep 23, 2025 6:13pm

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on October 2

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

"numericMatches": 30,
"avgExactDistance": 0.525909090909091,
"avgNumericDistance": 0.384980939108282,
"avgFScore": 0.4819999999999999
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Fictional Model Data in Benchmark Results

The benchmark results file includes data for a non-existent model, openai/gpt-5-codex. This model is fictional (GPT-5 is unreleased, Codex deprecated), suggesting test or placeholder data was accidentally committed. This invalidates benchmark comparisons and provides misleading performance metrics.

Additional Locations (1)

Fix in Cursor Fix in Web

@alrocar alrocar merged commit 03c5671 into main Sep 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant