Add GPT-5 evals and "minimal" to reasoning dropdown #2671

pamelafox · 2025-08-11T20:21:01Z

Purpose

This PR adds multiple evaluations of the gpt-5 models (summarized in this post: https://blog.pamelafox.org/2025/08/gpt-5-will-it-rag.html) and adds a "minimal" option to the "Reasoning effort" dropdown.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[X] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

Copilot

Pull Request Overview

This PR adds evaluation results for GPT-5 models and introduces a "minimal" reasoning effort option to the UI. The PR includes performance evaluation data for three GPT-5 variants (gpt-5, gpt-5-mini, and gpt-5-chat) and updates the frontend to support a new "minimal" reasoning effort setting that is specifically designed for GPT-5 models.

Adds comprehensive evaluation results for GPT-5 model variants with performance metrics
Introduces "minimal" reasoning effort option with localization support across 9 languages
Updates evaluation configuration and backend code to use minimal reasoning effort by default

Reviewed Changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
evals/results/*/summary.json	Performance evaluation summaries for GPT-5 model variants
evals/results/*/config.json	Evaluation configuration files for GPT-5 tests
evals/results/*/evaluate_parameters.json	Test parameters used for GPT-5 evaluations
evals/requirements.txt	Updates evaluator dependency to official Azure repository
evals/evaluate_config.json	Updates default evaluation configuration with new parameters
app/frontend/src/locales/*/translation.json	Adds "minimal" reasoning effort translations across 9 languages
app/frontend/src/components/Settings/Settings.tsx	Adds minimal option to reasoning effort dropdown
app/backend/approaches/chatreadretrieveread.py	Changes default reasoning effort from "low" to "minimal"

pamelafox added 2 commits August 10, 2025 21:36

GPT-5 evals

c1995cb

Add evals for GPT-5

7c9df41

pamelafox requested a review from Copilot August 11, 2025 20:21

Copilot AI reviewed Aug 11, 2025

View reviewed changes

pamelafox added 2 commits August 11, 2025 13:42

Upgrade openAI SDK

492d8bc

Change snapshots to reasoning_effort of minimal

f19e548

pamelafox merged commit 3da5ea8 into Azure-Samples:main Aug 11, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GPT-5 evals and "minimal" to reasoning dropdown #2671

Add GPT-5 evals and "minimal" to reasoning dropdown #2671

Uh oh!

pamelafox commented Aug 11, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Add GPT-5 evals and "minimal" to reasoning dropdown #2671

Add GPT-5 evals and "minimal" to reasoning dropdown #2671

Uh oh!

Conversation

pamelafox commented Aug 11, 2025

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!