Releases: Azure-Samples/azure-search-openai-demo
2025-09-11: Upgrade to latest AI Search Agentic Retrieval API
The Azure AI Search team recently announced improvements to the agentic retrieval API. We've upgraded the repo to use the latest API, which brings the benefit that the retrieved data now includes all the searchable fields, improving the compatibility of the agentic retrieval feature with other repo features (multimodal, ACLs). Our repo does not currently use the multi-index query planning or answer synthesis, but you may consider those in your forks.
This release also includes a bug fix for user login on Container Apps and fixes for the recent multimodal feature.
What's Changed
- feat: Add extra search index fields to Knowledge Agent response by @taylorn-ai in #2696
- Bump tenacity from 9.0.0 to 9.1.2 by @dependabot[bot] in #2700
- Bump regex from 2024.11.6 to 2025.7.34 by @dependabot[bot] in #2701
- Bump click from 8.1.7 to 8.1.8 by @dependabot[bot] in #2698
- Bump azure-core from 1.30.2 to 1.35.0 by @dependabot[bot] in #2697
- Bump h2 from 4.1.0 to 4.3.0 in /app/backend by @dependabot[bot] in #2705
- Add comparison with microsoft/azurechat to other samples documentation by @Copilot in #2703
- Update chat/ask prompts for improved consistency, run multimodal evals by @pamelafox in #2709
- Adjust defaults for multimodal-related parameters to work for default deployments by @pamelafox in #2717
- Fix ingestion for case when no images field exists by @pamelafox in #2719
- Remove pipeline section from azure.yaml by @pamelafox in #2720
- Pin devcontainer to bookworm by @pamelafox in #2722
- Upgrade minimum node version on pipelines by @pamelafox in #2721
- Bump the github-actions group with 3 updates by @dependabot[bot] in #2716
- Bump rich from 13.9.4 to 14.1.0 by @dependabot[bot] in #2714
- Upgrade to latest version of azure-search-documents and agentic retrieval API by @pamelafox in #2723
- Add missing RBAC role for token storage container when using container apps by @pamelafox in #2724
New Contributors
- @taylorn-ai made their first contribution in #2696
Full Changelog: 2025-08-29...2025-09-11
2025-08-29: New approach to multimodal RAG
This release introduces a large change, a new approach to multimodal RAG based off learnings from our original vision feature:
The new multimodal approach affects both data ingestion and the RAG flow.
During data ingestion, the prepdocs script will:
- extract images (using Azure Document Intelligence) and stores them separately in Azure Blob storage
- compute embeddings of extracted images using Azure AI Vision
- Use LLM to describe the images inside the text chunk, and embed that description inside the text chunk
- associate each text chunk in the Azure AI Search index with any nearby images
During the RAG flow, the Chat and Ask approaches will:
- [Optionally] perform a multivector search on both text and image embeddings
- [Optionally] send images associated with search results to the multimodal LLM
Here's what it looks like in the UI:
For more information:
- Read the multimodal guide in the docs
- See the Pull Request
- Watch a demo video
This is a significant change, so it will be difficult to merge into existing forks of the repository. We know that developers like when it is easy to merge in new features, but we realized we needed to do some refactoring to achieve the goals. If you're merging into an existing branch, the best approach is probably to review the PR to understand the scope of the changes. Notably, we removed the separate *Vision approaches, and integrated multimodal directly into the main Chat/Ask approaches.
For those of you who are using integrated vectorization or the new agentic retrieval feature, the multimodal support is not yet fully compatible with those, but we will prioritize compatibility for a release in the near future.
What's Changed
- Add GPT-5 evals and "minimal" to reasoning dropdown by @pamelafox in #2671
- Adding custom debug chat mode for GitHub Copilot Agent mode development by @pamelafox in #2672
- Use lowest reasoning effort appropriate for a model by @pamelafox in #2673
- Improved custom chat mode and Copilot instructions file by @pamelafox in #2681
- Update error message to be platform agnostic by @pamelafox in #2682
- Bump Azure/setup-azd from 2.1.0 to 2.2.0 in the github-actions group by @dependabot[bot] in #2669
- Add additional eval results for gpt-5-mini by @pamelafox in #2683
- Bump actions/checkout from 4 to 5 in the github-actions group by @dependabot[bot] in #2684
- Bump pypdf from 4.3.1 to 6.0.0 in /app/backend by @dependabot[bot] in #2674
- Add markdownlint extension by @pamelafox in #2689
- Hyperlink leads to random Korean betting website by @Daimler-Garay in #2691
- Fix a11y landmark issue and add Axe Playwright test by @pamelafox in #2687
- Add test coverage to CI workflow by @pamelafox in #2690
- Bump azure-monitor-opentelemetry from 1.6.1 to 1.6.13 by @dependabot[bot] in #2663
- Fix useAgenticRetrieval missing Japanese translations by @Copilot in #2694
- Initialize MSAL before use to fix auth regression by @pamelafox in #2685
- New approach to multimodal document ingestion by @pamelafox in #2558
New Contributors
- @Daimler-Garay made their first contribution in #2691
Full Changelog: 2025-08-07...2025-08-29
2025-08-07: Support for GPT-5 model family
The repo now has support for the GPT-5 model family, just announced by OpenAI. The docs are updated to show how to use the reasoning models (gpt-5, gpt-5-mini, gpt-5-nano) and chat model (gpt-5-chat):
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/reasoning.md
Note that the regions are limited for the new models, and you must fill out a form for the full gpt-5, per the Azure OpenAI docs.
What's Changed
- Add support for GPT-5 model family by @pamelafox in #2667
Full Changelog: 2024-08-04b...2025-08-07
2025-08-04a: Support for o3 and o4-mini
The only significant feature change in this release is the support of the o3 and o4-mini models when using [reasoning.
The other changes were dependency upgrades and a new architecture documentation page.
What's Changed
- Bump urllib3 from 2.2.2 to 2.5.0 in /app/backend by @dependabot[bot] in #2581
- Update README.md with Foundry buttons, RAG Deep Dive link by @pamelafox in #2587
- Bump attrs from 24.2.0 to 25.3.0 by @dependabot[bot] in #2563
- Bump python-dotenv from 1.0.1 to 1.1.1 by @dependabot[bot] in #2588
- Bump requests from 2.32.3 to 2.32.4 in /app/backend by @dependabot[bot] in #2566
- Bump std-uritemplate from 2.0.3 to 2.0.5 by @dependabot[bot] in #2536
- Upgrade h11 dependency by @pamelafox in #2596
- Updates to add latest omni models, upgrade package lock by @pamelafox in #2597
- Bump soupsieve from 2.6 to 2.7 by @dependabot[bot] in #2601
- Bump aiohttp from 3.10.11 to 3.12.14 in /app/backend by @dependabot[bot] in #2606
- Remove conditional Azure login steps and simplify authentication in deployment workflows by @Copilot in #2625
- Fix Dependabot MSAL package upgrade by updating compatible versions by @Copilot in #2632
- Upgrade rapidfuzz from 3.12.1 to 3.13.0 to fix failed Dependabot PR #2504 by @Copilot in #2646
- Fix Vite 7.0.6 upgrade by updating @vitejs/plugin-react to v4.7.0 by @Copilot in #2630
- Add comprehensive Mermaid architecture diagrams for application documentation by @Copilot in #2653
- Revert vite to earlier version for node v20.14 compatibility by @pamelafox in #2657
New Contributors
- @Copilot made their first contribution in #2625
Full Changelog: 2025-06-03...2025-08-04
2025-08-04b: Private networking for Container Apps + P2S VPN Gateway
This release updates the private networking feature to add support for Azure Container Apps, the default deployment host.
It also adds an optional P2S VPN Gateway (secured with Entra ID) with an Azure Private DNS resolver, so that developers can test and deploy from their own machines.
Please open an issue if you try out the feature and encounter any issues. Please make sure you are using additional security auditing mechanism to confirm the deployment meets the need of your organization, such as Microsoft Defender for Cloud.
What's Changed
- Private endpoint support for container apps by @pamelafox in #2322
- Add Bicep description for infra/private-endpoints.bicep by @Copilot in #2665
- Update deploy_private.md docs to reflect new feature by @pamelafox in #2666
Full Changelog: 2025-08-04...2024-08-04b
2025-06-03: Default chat completion model is gpt-4.1-mini
After careful consideration and evaluation, the new default chat completion model for the RAG flow is gpt-4.1-mini. This model is slightly more expensive than gpt-4o-mini, but has significantly better industry benchmarks and RAG evaluation results than gpt-4o-mini. You can still point your application at any GPT model, following the steps in the documentation.
There's also new documentation available, about the HTTP protocol used between backend and frontend.
What's Changed
- Fix: UploadFile button incorrectly disabled when user is logged in by @AstroMC98 in #2545
- Remove earth_at_night_508.pdf from data folder by @pamelafox in #2547
- Disable ThoughtProcess/SupportingContent buttons during streaming by @pamelafox in #2548
- Bump typing-extensions from 4.12.2 to 4.13.2 by @dependabot in #2505
- Bump pymupdf from 1.25.1 to 1.26.0 by @dependabot in #2546
- Add HTTP protocol doc by @pamelafox in #2549
- Switch to gpt-41-mini as default chat model by @pamelafox in #2557
New Contributors
- @AstroMC98 made their first contribution in #2545
Full Changelog: 2025-05-23...2025-06-03
2025-05-23: Optional feature for agentic retrieval from Azure AI Search
This release includes an exciting new option to turn on an agentic retrieval API from Azure AI Search (currently in public preview).
Read the docs about it here:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/agentic_retrieval.md
You can also watch this talk from @mattgotteiner and @pamelafox at Microsoft Build 2025 about agentic retrieval:
https://build.microsoft.com/en-US/sessions/BRK142
Please share your feedback in either the issue tracker or discussions here. Since the retrieval API is in public preview, this is a great time to give feedback to the AI Search team.
What's Changed
- Explicitly activate the uv environment in CI by @pamelafox in #2534
- Updates the baseline evals with embedding 3 large, renames other folders for clarity by @pamelafox in #2533
- Add support for agentic retrieval by @mattgotteiner in #2537
- Remove locust from requirements-dev.txt by @pamelafox in #2539
- Fix UI and answer gen issues by @mattgotteiner in #2541
Full Changelog: 2025-05-08...2025-05-23
2025-05-08: Default to text-embedding-3-large with compression, GlobalStandard SKU
This release upgrades the infrastructure and code to default to the text-embedding-3-large model from OpenAI. The model has a maximum dimensions of 3072, but we are using BinaryQuantizationCompression and truncating the dimensions to 1024, with oversampling and rescoring enabled. That means the embeddings will be stored efficiently, but search quality should remain high.
Learn more about compression from this RAG time episode or Azure AI Search documentation.
If you are already using the repository and don't wish to use the new embedding model, you can continue to use text-embedding-ada-002. You may need to set azd environment variables if they aren't already set, see the embedding models customization guide. If you want to switch over to the new embedding model, you will either need to re-ingest your data from scratch in a new index, or you will need to add a new field for the new model and re-generate embeddings for just that field. The code now has a variable for the embedding column field, so it should be possible to have a search index with fields for two different embedding models.
As part of this change, all model deploments now default to the GlobalStandard SKU. We made that change since it is easier to find regions in common across the many models used by this repository when using the GlobalStandard SKU. However, if you can't use that SKU for whatever reason, you can still customize the SKU using the parameters described in the documentation.
Please let us know in the issue tracker if you encounter any issues with the new default embedding model configuration.
What's Changed
- Upgrade syntax to Python 3.9 by @tonybaloney in #2484
- Remove outdated docs by @pamelafox in #2492
- Use ENFORCE_ACCESS_CONTROL to decide whether to make acls by @pamelafox in #2494
- Bump idna from 3.8 to 3.10 by @dependabot in #2464
- Bump vite from 5.4.14 to 5.4.18 in /app/frontend by @dependabot in #2486
- Bump types-html5lib from 1.1.11.20240806 to 1.1.11.20241018 by @dependabot in #2462
- Bump msal-extensions from 1.2.0 to 1.3.1 by @dependabot in #2463
- Update reasoning docs to include API version by @pamelafox in #2499
- Bump @babel/runtime from 7.25.6 to 7.27.0 in /app/frontend by @dependabot in #2497
- Upgrade Bicep versions of resources by @pamelafox in #2500
- Add missing output for reasoning effort, updated evals including o3-mini by @pamelafox in #2501
- Resolve datetime deprecation warnings by @emmanuel-ferdman in #2502
- Upgrade to text-embedding-3-large model as default, with vector storage optimizations by @pamelafox in #2470
- Update evals requirements by @pamelafox in #2528
- Raise minimum node version by @pamelafox in #2519
- Add migration script for Azure Cosmos DB, old container to new container by @pamelafox in #2442
- Bump astral-sh/setup-uv from 5 to 6 in the github-actions group by @dependabot in #2512
New Contributors
- @emmanuel-ferdman made their first contribution in #2502
Full Changelog: 2025-04-02...2025-05-08
2025-04-02: Support for reasoning models and token usage display
You can now optionally use a reasoning model (o1 or o3-mini) for all chat completion requests, following the reasoning guide.
When using a reasoning model, you can select the reasoning effort (low/medium/high):
For all models, you can now see token usage in the "Thought process" tab:
Reasoning models incur more latency, due to the thinking process, so it is an option for developers to try, but not necessarily what you want to use for most RAG domains.
This PR also includes several fixes for performance, Windows support, and deployment.
What's Changed
- Add quotes to azd env set by @mattgotteiner in #2413
- Upgrade ms graph SDK packages to remove pendulum dependency by @pamelafox in #2454
- Reduce list to only the available ones for gpt-4o-mini/Standard by @pamelafox in #2459
- Add support for reasoning models and token usage display by @mattgotteiner in #2448
- Upgrade prompty by @pamelafox in #2475
Full Changelog: 2025-03-26...2025-04-02
2025-03-26: Removal of conversation truncation logic
Previously, we had logic that would truncate conversation history by counting the tokens (with tiktoken) and only keeping the messages that fit inside the context window. Now that we are using a model with a higher context window (128K) and most models have that high limit, we have removed that truncation logic, so all conversations will be sent in full to the model.
See the pull request for more reasoning behind the decision.
## What's Changed
- Remove token-counting library for conversation history truncation by @pamelafox in #2449
Full Changelog: 2025-03-25...2025-03-26