-
Notifications
You must be signed in to change notification settings - Fork 5k
Switch to gpt-41-mini as default chat model #2557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check Broken URLsWe have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue. Check the file paths and associated broken URLs inside them.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR changes the default chat model from "gpt-4o-mini" to "gpt-4.1-mini" across tests, infrastructure configuration, evaluation files, and documentation. Key changes include updating environment variables and deployment parameters, revising snapshot tests and config files, and modifying documentation to align with the new model.
Reviewed Changes
Copilot reviewed 93 out of 93 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/snapshots/test_app/test_ask_prompt_template/client0/result.json | Updated model field to "gpt-4.1-mini". |
tests/e2e.py | Switched environment variable AZURE_OPENAI_CHATGPT_MODEL to "gpt-4.1-mini". |
tests/conftest.py | Updated model assignment in mock functions to reflect the new default. |
infra/main.bicep | Revised chat model deployment parameters and version to the new default. |
infra/core/host/container-apps.bicep | Adjusted resource group resolution syntax for clarity. |
evals/results/* | Updated various evaluation JSON files to reflect version changes; baseline summary now shows different latency stats. |
docs/* | Updated documentation examples and instructions to reference "gpt-4.1-mini". |
Comments suppressed due to low confidence (1)
evals/results/baseline/summary.json:20
- The minimum latency value is negative (-1.0), which likely indicates an error in measurement or calculation. Please verify the logic for computing latency to ensure all recorded values are non-negative.
"min": -1.0
docs/agentic_retrieval.md
Outdated
azd env set AZURE_OPENAI_SEARCHAGENT_DEPLOYMENT searchagent | ||
azd env set AZURE_OPENAI_SEARCHAGENT_MODEL gpt-4o | ||
azd env set AZURE_OPENAI_SEARCHAGENT_MODEL gpt-4.1-mini | ||
azd env set AZURE_OPENAI_SEARCHAGENT_MODEL_VERSION 2024-11-20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a different version here? Is 2025-04-14 a valid one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to set env for gpt-50?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gpt-5 model family cannot yet be used for agentic retrieval, since they support new parameters that the AI Search service needs to support. The Search team is working on it, please try in a few weeks and file a new issue if it does not work.
* Switch to gpt-41-mini with evaluations * Update model used in tests * Change search agent to 4.1-mini as well * Update model version
Purpose
As suggested by @mattgotteiner, gpt-4.1-mini is a better model than gpt-4o-mini, especially for the tasks in this project. It is slightly more expensive (0.40 vs 0.15) but nowhere near as expensive as reasoning models, so the increased cost seems worth the quality upgrade.
I re-ran evaluations, and gpt-4.1-mini got significantly higher groudedness and relevance, with somewhat shorter answers. The shorter answers are good as gpt-4o-mini was a bit overly verbose at times.
Evaluation results:
Here's an example question where gpt-4.1-mini was scored better:
Generally the answers aren't vastly different across the two models, based off my skim through our dataset.
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest
).python -m pytest --cov
to verify 100% coverage of added linespython -m mypy
to check for type errorsruff
andblack
manually on my code.