Skip to content

Releases: Azure-Samples/azure-search-openai-demo

2025-09-11: Upgrade to latest AI Search Agentic Retrieval API

11 Sep 17:38
9e74970
Compare
Choose a tag to compare

The Azure AI Search team recently announced improvements to the agentic retrieval API. We've upgraded the repo to use the latest API, which brings the benefit that the retrieved data now includes all the searchable fields, improving the compatibility of the agentic retrieval feature with other repo features (multimodal, ACLs). Our repo does not currently use the multi-index query planning or answer synthesis, but you may consider those in your forks.

This release also includes a bug fix for user login on Container Apps and fixes for the recent multimodal feature.

What's Changed

  • feat: Add extra search index fields to Knowledge Agent response by @taylorn-ai in #2696
  • Bump tenacity from 9.0.0 to 9.1.2 by @dependabot[bot] in #2700
  • Bump regex from 2024.11.6 to 2025.7.34 by @dependabot[bot] in #2701
  • Bump click from 8.1.7 to 8.1.8 by @dependabot[bot] in #2698
  • Bump azure-core from 1.30.2 to 1.35.0 by @dependabot[bot] in #2697
  • Bump h2 from 4.1.0 to 4.3.0 in /app/backend by @dependabot[bot] in #2705
  • Add comparison with microsoft/azurechat to other samples documentation by @Copilot in #2703
  • Update chat/ask prompts for improved consistency, run multimodal evals by @pamelafox in #2709
  • Adjust defaults for multimodal-related parameters to work for default deployments by @pamelafox in #2717
  • Fix ingestion for case when no images field exists by @pamelafox in #2719
  • Remove pipeline section from azure.yaml by @pamelafox in #2720
  • Pin devcontainer to bookworm by @pamelafox in #2722
  • Upgrade minimum node version on pipelines by @pamelafox in #2721
  • Bump the github-actions group with 3 updates by @dependabot[bot] in #2716
  • Bump rich from 13.9.4 to 14.1.0 by @dependabot[bot] in #2714
  • Upgrade to latest version of azure-search-documents and agentic retrieval API by @pamelafox in #2723
  • Add missing RBAC role for token storage container when using container apps by @pamelafox in #2724

New Contributors

Full Changelog: 2025-08-29...2025-09-11

2025-08-29: New approach to multimodal RAG

29 Aug 19:29
f2007b2
Compare
Choose a tag to compare

This release introduces a large change, a new approach to multimodal RAG based off learnings from our original vision feature:

The new multimodal approach affects both data ingestion and the RAG flow.

During data ingestion, the prepdocs script will:

  • extract images (using Azure Document Intelligence) and stores them separately in Azure Blob storage
  • compute embeddings of extracted images using Azure AI Vision
  • Use LLM to describe the images inside the text chunk, and embed that description inside the text chunk
  • associate each text chunk in the Azure AI Search index with any nearby images

During the RAG flow, the Chat and Ask approaches will:

  • [Optionally] perform a multivector search on both text and image embeddings
  • [Optionally] send images associated with search results to the multimodal LLM

Here's what it looks like in the UI:

Screenshot of app with Developer Settings open, showing multimodal settings highlighted

For more information:

This is a significant change, so it will be difficult to merge into existing forks of the repository. We know that developers like when it is easy to merge in new features, but we realized we needed to do some refactoring to achieve the goals. If you're merging into an existing branch, the best approach is probably to review the PR to understand the scope of the changes. Notably, we removed the separate *Vision approaches, and integrated multimodal directly into the main Chat/Ask approaches.

For those of you who are using integrated vectorization or the new agentic retrieval feature, the multimodal support is not yet fully compatible with those, but we will prioritize compatibility for a release in the near future.

What's Changed

  • Add GPT-5 evals and "minimal" to reasoning dropdown by @pamelafox in #2671
  • Adding custom debug chat mode for GitHub Copilot Agent mode development by @pamelafox in #2672
  • Use lowest reasoning effort appropriate for a model by @pamelafox in #2673
  • Improved custom chat mode and Copilot instructions file by @pamelafox in #2681
  • Update error message to be platform agnostic by @pamelafox in #2682
  • Bump Azure/setup-azd from 2.1.0 to 2.2.0 in the github-actions group by @dependabot[bot] in #2669
  • Add additional eval results for gpt-5-mini by @pamelafox in #2683
  • Bump actions/checkout from 4 to 5 in the github-actions group by @dependabot[bot] in #2684
  • Bump pypdf from 4.3.1 to 6.0.0 in /app/backend by @dependabot[bot] in #2674
  • Add markdownlint extension by @pamelafox in #2689
  • Hyperlink leads to random Korean betting website by @Daimler-Garay in #2691
  • Fix a11y landmark issue and add Axe Playwright test by @pamelafox in #2687
  • Add test coverage to CI workflow by @pamelafox in #2690
  • Bump azure-monitor-opentelemetry from 1.6.1 to 1.6.13 by @dependabot[bot] in #2663
  • Fix useAgenticRetrieval missing Japanese translations by @Copilot in #2694
  • Initialize MSAL before use to fix auth regression by @pamelafox in #2685
  • New approach to multimodal document ingestion by @pamelafox in #2558

New Contributors

Full Changelog: 2025-08-07...2025-08-29

2025-08-07: Support for GPT-5 model family

07 Aug 19:56
570e530
Compare
Choose a tag to compare

The repo now has support for the GPT-5 model family, just announced by OpenAI. The docs are updated to show how to use the reasoning models (gpt-5, gpt-5-mini, gpt-5-nano) and chat model (gpt-5-chat):

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/reasoning.md

https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/deploy_features.md#using-different-chat-completion-models

Note that the regions are limited for the new models, and you must fill out a form for the full gpt-5, per the Azure OpenAI docs.

What's Changed

Full Changelog: 2024-08-04b...2025-08-07

2025-08-04a: Support for o3 and o4-mini

04 Aug 17:39
a57cf7a
Compare
Choose a tag to compare

The only significant feature change in this release is the support of the o3 and o4-mini models when using [reasoning.
The other changes were dependency upgrades and a new architecture documentation page.

What's Changed

  • Bump urllib3 from 2.2.2 to 2.5.0 in /app/backend by @dependabot[bot] in #2581
  • Update README.md with Foundry buttons, RAG Deep Dive link by @pamelafox in #2587
  • Bump attrs from 24.2.0 to 25.3.0 by @dependabot[bot] in #2563
  • Bump python-dotenv from 1.0.1 to 1.1.1 by @dependabot[bot] in #2588
  • Bump requests from 2.32.3 to 2.32.4 in /app/backend by @dependabot[bot] in #2566
  • Bump std-uritemplate from 2.0.3 to 2.0.5 by @dependabot[bot] in #2536
  • Upgrade h11 dependency by @pamelafox in #2596
  • Updates to add latest omni models, upgrade package lock by @pamelafox in #2597
  • Bump soupsieve from 2.6 to 2.7 by @dependabot[bot] in #2601
  • Bump aiohttp from 3.10.11 to 3.12.14 in /app/backend by @dependabot[bot] in #2606
  • Remove conditional Azure login steps and simplify authentication in deployment workflows by @Copilot in #2625
  • Fix Dependabot MSAL package upgrade by updating compatible versions by @Copilot in #2632
  • Upgrade rapidfuzz from 3.12.1 to 3.13.0 to fix failed Dependabot PR #2504 by @Copilot in #2646
  • Fix Vite 7.0.6 upgrade by updating @vitejs/plugin-react to v4.7.0 by @Copilot in #2630
  • Add comprehensive Mermaid architecture diagrams for application documentation by @Copilot in #2653
  • Revert vite to earlier version for node v20.14 compatibility by @pamelafox in #2657

New Contributors

  • @Copilot made their first contribution in #2625

Full Changelog: 2025-06-03...2025-08-04

2025-08-04b: Private networking for Container Apps + P2S VPN Gateway

04 Aug 18:39
b96b186
Compare
Choose a tag to compare

This release updates the private networking feature to add support for Azure Container Apps, the default deployment host.
It also adds an optional P2S VPN Gateway (secured with Entra ID) with an Azure Private DNS resolver, so that developers can test and deploy from their own machines.

Please open an issue if you try out the feature and encounter any issues. Please make sure you are using additional security auditing mechanism to confirm the deployment meets the need of your organization, such as Microsoft Defender for Cloud.

What's Changed

  • Private endpoint support for container apps by @pamelafox in #2322
  • Add Bicep description for infra/private-endpoints.bicep by @Copilot in #2665
  • Update deploy_private.md docs to reflect new feature by @pamelafox in #2666

Full Changelog: 2025-08-04...2024-08-04b

2025-06-03: Default chat completion model is gpt-4.1-mini

03 Jun 18:35
10904b6
Compare
Choose a tag to compare

After careful consideration and evaluation, the new default chat completion model for the RAG flow is gpt-4.1-mini. This model is slightly more expensive than gpt-4o-mini, but has significantly better industry benchmarks and RAG evaluation results than gpt-4o-mini. You can still point your application at any GPT model, following the steps in the documentation.

There's also new documentation available, about the HTTP protocol used between backend and frontend.

What's Changed

New Contributors

Full Changelog: 2025-05-23...2025-06-03

2025-05-23: Optional feature for agentic retrieval from Azure AI Search

23 May 17:06
1b9885c
Compare
Choose a tag to compare

This release includes an exciting new option to turn on an agentic retrieval API from Azure AI Search (currently in public preview).
Read the docs about it here:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/agentic_retrieval.md

You can also watch this talk from @mattgotteiner and @pamelafox at Microsoft Build 2025 about agentic retrieval:
https://build.microsoft.com/en-US/sessions/BRK142

Please share your feedback in either the issue tracker or discussions here. Since the retrieval API is in public preview, this is a great time to give feedback to the AI Search team.

What's Changed

Full Changelog: 2025-05-08...2025-05-23

2025-05-08: Default to text-embedding-3-large with compression, GlobalStandard SKU

09 May 06:44
faf0d46
Compare
Choose a tag to compare

This release upgrades the infrastructure and code to default to the text-embedding-3-large model from OpenAI. The model has a maximum dimensions of 3072, but we are using BinaryQuantizationCompression and truncating the dimensions to 1024, with oversampling and rescoring enabled. That means the embeddings will be stored efficiently, but search quality should remain high.
Learn more about compression from this RAG time episode or Azure AI Search documentation.

If you are already using the repository and don't wish to use the new embedding model, you can continue to use text-embedding-ada-002. You may need to set azd environment variables if they aren't already set, see the embedding models customization guide. If you want to switch over to the new embedding model, you will either need to re-ingest your data from scratch in a new index, or you will need to add a new field for the new model and re-generate embeddings for just that field. The code now has a variable for the embedding column field, so it should be possible to have a search index with fields for two different embedding models.

As part of this change, all model deploments now default to the GlobalStandard SKU. We made that change since it is easier to find regions in common across the many models used by this repository when using the GlobalStandard SKU. However, if you can't use that SKU for whatever reason, you can still customize the SKU using the parameters described in the documentation.

Please let us know in the issue tracker if you encounter any issues with the new default embedding model configuration.

What's Changed

New Contributors

Full Changelog: 2025-04-02...2025-05-08

2025-04-02: Support for reasoning models and token usage display

03 Apr 02:40
56294c9
Compare
Choose a tag to compare

You can now optionally use a reasoning model (o1 or o3-mini) for all chat completion requests, following the reasoning guide.

When using a reasoning model, you can select the reasoning effort (low/medium/high):

Screenshot of developer settings with reasoning model

For all models, you can now see token usage in the "Thought process" tab:

Display of token usage counts

Reasoning models incur more latency, due to the thinking process, so it is an option for developers to try, but not necessarily what you want to use for most RAG domains.

This PR also includes several fixes for performance, Windows support, and deployment.

What's Changed

Full Changelog: 2025-03-26...2025-04-02

2025-03-26: Removal of conversation truncation logic

26 Mar 22:43
cb5149d
Compare
Choose a tag to compare

Previously, we had logic that would truncate conversation history by counting the tokens (with tiktoken) and only keeping the messages that fit inside the context window. Now that we are using a model with a higher context window (128K) and most models have that high limit, we have removed that truncation logic, so all conversations will be sent in full to the model.
See the pull request for more reasoning behind the decision.

 ## What's Changed

  • Remove token-counting library for conversation history truncation by @pamelafox in #2449

Full Changelog: 2025-03-25...2025-03-26