2025-04-02: Support for reasoning models and token usage display
You can now optionally use a reasoning model (o1 or o3-mini) for all chat completion requests, following the reasoning guide.
When using a reasoning model, you can select the reasoning effort (low/medium/high):
For all models, you can now see token usage in the "Thought process" tab:
Reasoning models incur more latency, due to the thinking process, so it is an option for developers to try, but not necessarily what you want to use for most RAG domains.
This PR also includes several fixes for performance, Windows support, and deployment.
What's Changed
- Add quotes to azd env set by @mattgotteiner in #2413
- Upgrade ms graph SDK packages to remove pendulum dependency by @pamelafox in #2454
- Reduce list to only the available ones for gpt-4o-mini/Standard by @pamelafox in #2459
- Add support for reasoning models and token usage display by @mattgotteiner in #2448
- Upgrade prompty by @pamelafox in #2475
Full Changelog: 2025-03-26...2025-04-02