Switch to gpt-41-mini as default chat model #2557

pamelafox · 2025-06-03T07:20:14Z

Purpose

As suggested by @mattgotteiner, gpt-4.1-mini is a better model than gpt-4o-mini, especially for the tasks in this project. It is slightly more expensive (0.40 vs 0.15) but nowhere near as expensive as reasoning models, so the increased cost seems worth the quality upgrade.

I re-ran evaluations, and gpt-4.1-mini got significantly higher groudedness and relevance, with somewhat shorter answers. The shorter answers are good as gpt-4o-mini was a bit overly verbose at times.

Evaluation results:

metric	stat	baseline	gpt35turbo-ada002	gpt4omini-ada002	gpt4omini-emb3l	gpt4omini-emb3l-2	o3mini-ada002
gpt_groundedness	mean_rating	4.76	4.62	4.62	4.5	4.54	4.8
↑	pass_rate	0.94	0.88	0.88	0.86	0.88	0.96
gpt_relevance	mean_rating	4.42	4.14	4.12	4.22	4.2	4.0
↑	pass_rate	0.94	0.88	0.84	0.84	0.88	0.9
answer_length	mean	829.06	631.88	922.42	919.26	906.34	499.22
latency	mean	2.89	2.24	3.14	4.46	3.71	19.38
citations_matched	rate	0.52	0.46	0.5	0.49	0.49	0.51
any_citation	rate	0.98	1.0	1.0	1.0	1.0	0.98

Here's an example question where gpt-4.1-mini was scored better:

Generally the answers aren't vastly different across the two models, based off my skim through our dataset.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[X] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

github-actions · 2025-06-03T07:21:58Z

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them.
For more details, check our Contributing Guide.

File Full Path Issues

data/Contoso_Electronics_Company_Overview.md

#	Link	Line Number
1	`http://www.contoso.com`	`46`
2	`http://www.contoso.com`	`48`

Copilot

Pull Request Overview

This PR changes the default chat model from "gpt-4o-mini" to "gpt-4.1-mini" across tests, infrastructure configuration, evaluation files, and documentation. Key changes include updating environment variables and deployment parameters, revising snapshot tests and config files, and modifying documentation to align with the new model.

Reviewed Changes

Copilot reviewed 93 out of 93 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/snapshots/test_app/test_ask_prompt_template/client0/result.json	Updated model field to "gpt-4.1-mini".
tests/e2e.py	Switched environment variable AZURE_OPENAI_CHATGPT_MODEL to "gpt-4.1-mini".
tests/conftest.py	Updated model assignment in mock functions to reflect the new default.
infra/main.bicep	Revised chat model deployment parameters and version to the new default.
infra/core/host/container-apps.bicep	Adjusted resource group resolution syntax for clarity.
evals/results/*	Updated various evaluation JSON files to reflect version changes; baseline summary now shows different latency stats.
docs/*	Updated documentation examples and instructions to reference "gpt-4.1-mini".

Comments suppressed due to low confidence (1)

evals/results/baseline/summary.json:20

The minimum latency value is negative (-1.0), which likely indicates an error in measurement or calculation. Please verify the logic for computing latency to ensure all recorded values are non-negative.

"min": -1.0

mattgotteiner · 2025-06-03T17:04:21Z

docs/agentic_retrieval.md

   azd env set AZURE_OPENAI_SEARCHAGENT_DEPLOYMENT searchagent
-   azd env set AZURE_OPENAI_SEARCHAGENT_MODEL gpt-4o
+   azd env set AZURE_OPENAI_SEARCHAGENT_MODEL gpt-4.1-mini
   azd env set AZURE_OPENAI_SEARCHAGENT_MODEL_VERSION 2024-11-20


Should we use a different version here? Is 2025-04-14 a valid one

Good catch, fixed!

How to set env for gpt-50?

The gpt-5 model family cannot yet be used for agentic retrieval, since they support new parameters that the AI Search service needs to support. The Search team is working on it, please try in a few weeks and file a new issue if it does not work.

* Switch to gpt-41-mini with evaluations * Update model used in tests * Change search agent to 4.1-mini as well * Update model version

Switch to gpt-41-mini with evaluations

76b98cf

pamelafox added 2 commits June 3, 2025 09:17

Update model used in tests

3631d11

Change search agent to 4.1-mini as well

eb34e71

pamelafox requested review from mattgotteiner and Copilot June 3, 2025 16:36

Copilot AI reviewed Jun 3, 2025

View reviewed changes

mattgotteiner approved these changes Jun 3, 2025

View reviewed changes

Update model version

c7d7247

pamelafox merged commit 10904b6 into Azure-Samples:main Jun 3, 2025
28 checks passed

palamangelus mentioned this pull request Jun 9, 2025

Bicep compilation issue #2562

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Switch to gpt-41-mini as default chat model #2557

Switch to gpt-41-mini as default chat model #2557

Uh oh!

pamelafox commented Jun 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

mattgotteiner Jun 3, 2025

Uh oh!

pamelafox Jun 3, 2025

Uh oh!

dorteedannesboe Aug 27, 2025

Uh oh!

pamelafox Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Switch to gpt-41-mini as default chat model #2557

Switch to gpt-41-mini as default chat model #2557

Uh oh!

Conversation

pamelafox commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

Uh oh!

github-actions bot commented Jun 3, 2025

Check Broken URLs

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

mattgotteiner Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

pamelafox Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

dorteedannesboe Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

pamelafox Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pamelafox commented Jun 3, 2025 •

edited

Loading