Skip to content

2025-04-02: Support for reasoning models and token usage display

Compare
Choose a tag to compare
@pamelafox pamelafox released this 03 Apr 02:40
· 92 commits to main since this release
56294c9

You can now optionally use a reasoning model (o1 or o3-mini) for all chat completion requests, following the reasoning guide.

When using a reasoning model, you can select the reasoning effort (low/medium/high):

Screenshot of developer settings with reasoning model

For all models, you can now see token usage in the "Thought process" tab:

Display of token usage counts

Reasoning models incur more latency, due to the thinking process, so it is an option for developers to try, but not necessarily what you want to use for most RAG domains.

This PR also includes several fixes for performance, Windows support, and deployment.

What's Changed

Full Changelog: 2025-03-26...2025-04-02