Skip to content

Conversation

pamelafox
Copy link
Collaborator

Purpose

This PR adds multiple evaluations of the gpt-5 models (summarized in this post: https://blog.pamelafox.org/2025/08/gpt-5-will-it-rag.html) and adds a "minimal" option to the "Reasoning effort" dropdown.

Screenshot 2025-08-11 at 1 20 36 PM

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[X] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox pamelafox requested a review from Copilot August 11, 2025 20:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds evaluation results for GPT-5 models and introduces a "minimal" reasoning effort option to the UI. The PR includes performance evaluation data for three GPT-5 variants (gpt-5, gpt-5-mini, and gpt-5-chat) and updates the frontend to support a new "minimal" reasoning effort setting that is specifically designed for GPT-5 models.

  • Adds comprehensive evaluation results for GPT-5 model variants with performance metrics
  • Introduces "minimal" reasoning effort option with localization support across 9 languages
  • Updates evaluation configuration and backend code to use minimal reasoning effort by default

Reviewed Changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated no comments.

Show a summary per file
File Description
evals/results/*/summary.json Performance evaluation summaries for GPT-5 model variants
evals/results/*/config.json Evaluation configuration files for GPT-5 tests
evals/results/*/evaluate_parameters.json Test parameters used for GPT-5 evaluations
evals/requirements.txt Updates evaluator dependency to official Azure repository
evals/evaluate_config.json Updates default evaluation configuration with new parameters
app/frontend/src/locales/*/translation.json Adds "minimal" reasoning effort translations across 9 languages
app/frontend/src/components/Settings/Settings.tsx Adds minimal option to reasoning effort dropdown
app/backend/approaches/chatreadretrieveread.py Changes default reasoning effort from "low" to "minimal"

@pamelafox pamelafox merged commit 3da5ea8 into Azure-Samples:main Aug 11, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant