feat: Add extra search index fields to Knowledge Agent response #2696

taylorn-ai · 2025-08-22T01:26:07Z

Purpose

As discussed in #2569, adds an optional “agentic reference hydration” feature to agentic retrieval so the app can fetch the full documents for the agent’s references (all index fields, not just the semantic fields). This enables richer source metadata (e.g., sourcefile, category, scores) and more accurate citations, while keeping the legacy behaviour as default.

Key points:

New env flag: ENABLE_AGENTIC_REF_HYDRATION (default false).
When enabled, references returned by the Knowledge Agent are hydrated via a follow-up Azure AI Search query using a composed search.in filter over unique doc_keys.
Preserves existing behaviour when disabled: builds Document objects from source_data embedded in references.
Injects the agent’s per-activity search query into each Document (via search_agent_query) for better observability.
Deterministic ordering:
- Default: preserves agent order.
- results_merge_strategy == "interleaved": sorts by reference.id (ascending) to interleave across activities.
Infra wiring: Bicep + parameters propagate the new flag to App Service env vars.
Documentation updated to describe the optional hydration behaviour and how to enable it.
Comprehensive tests added covering:
- Hydrated vs non-hydrated paths
- Interleaved sorting
- Deduplication of duplicate doc_keys
- Respecting top
- Missing/empty doc_keys
- Empty hydration results

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[x] No

Why:

The new behaviour is gated behind ENABLE_AGENTIC_REF_HYDRATION which defaults to false.
Constructors were extended with a defaulted hydrate_references: bool = False and all call sites updated accordingly.
Infra adds a new parameter with a default of false; existing environments deploy without change.

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[x] No

Type of change

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

- Fix command to generate HTML report for coverage using `diff-cover` 🛠️

- Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable to control reference hydration behaviour 🌱 - Update Approach classes to accept hydrate_references parameter for managing reference hydration logic 🔧 - Modify document retrieval logic to hydrate references when enabled, improving data completeness 📄

- ✨ Add support for enabling extra field hydration in agentic retrieval - 🔧 Update infrastructure to include new parameter for hydration - 📝 Modify documentation to reflect changes in usage instructions

- 🎉 Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable for configuration - 🧪 Implement mock search results for hydration testing in agentic retrieval - 🔍 Create tests for agentic retrieval with and without hydration enabled - 📜 Ensure hydrated results include additional fields from search results

taylorn-ai · 2025-08-22T01:27:05Z

@microsoft-github-policy-service agree

pamelafox · 2025-08-23T15:01:22Z

Thank you @taylorn-ai for the great contribution! I would like @mattgotteiner from the AI Search team to review this, given his expertise with agentic retrieval.

taylorn-ai · 2025-08-24T05:11:32Z

I ran black on the files I changed, do you want me to re-run on everything? Or do you want to do that?

taylorn-ai · 2025-08-24T21:31:46Z

My bad, seems I didn't run black on the tests directory.

taylorn-ai · 2025-08-25T20:29:26Z

Apologies, I thought I had updated the test snapshots. Also @pamelafox, FYI, the pre-commit hooks aren't working. I ran it manually and it changed some other files that I hadn't edited, so I left them as they were.

pamelafox · 2025-08-25T20:49:08Z

@taylorn-ai Interesting, sometimes that happens when the formatter changes its rules. I do run them locally but I might need to explicitly re-install the pre-commit hooks to see if formatter rules changes/

taylorn-ai · 2025-08-25T21:04:01Z

Two of the tests failed due to a network issue, is that common? Seems odd. At least it's not my fault 😀

taylorn-ai · 2025-09-01T04:04:45Z

@pamelafox - I noticed some issues with my original tests, and once I changed them, things went downhill from there. I am pulling my hair out right now, are you able to assist with the tests please? Not my forte...

Edit: This is where I am at:

app/backend/app.py (100%)
app/backend/approaches/approach.py (97.6%): Missing lines 183
app/backend/approaches/chatreadretrieveread.py (100%)
app/backend/approaches/retrievethenread.py (100%)
tests/conftest.py (100%)
tests/mocks.py (94.7%): Missing lines 281
tests/test_agentic_retrieval.py (100%)
tests/test_chatapproach.py (100%)
-------------
Total:   181 lines
Missing: 2 lines
Coverage: 98%
-------------

Line 183 in approach.py is

self.hydrate_references = hydrate_references

And I don't understand how this is not being covered by tests.

And line 281 in mocks.py is completely unrelated to my changes, so not sure what's going on there...

- 🎨 Introduce `create_mock_retrieve` to parameterise mock retrieval responses. - 🔄 Remove redundant mock search functions to streamline code. - 🧪 Update tests to use the new mock retrieval function for various scenarios. - 🧹 Clean up unused mock functions to enhance maintainability.

pamelafox · 2025-09-01T22:30:54Z

@taylorn-ai I'm happy to take a look when I'm back tomorrow - I imagine my recent merge may have added complexity. But I'm excited about your change, as I think I can get multimodal working with agentic retrieval once we've have these re-hydrated references!

taylorn-ai · 2025-09-01T22:32:39Z

@taylorn-ai I'm happy to take a look when I'm back tomorrow - I imagine my recent merge may have added complexity. But I'm excited about your change, as I think I can get multimodal working with agentic retrieval once we've have these re-hydrated references!

I had a few conflicts when merging your changes, but I think I sorted them. Just a few strange issues.

pamelafox · 2025-09-03T06:41:24Z

@taylorn-ai I checked out the branch locally and gave it a try.

I realized it doesn't work with the default indexes as they do not have the "id" field set as filterable=True, and thats required for the in operator to work. I'm assuming your search index uses integrated vectorization, or that you set it up custom in some other way, such that the "id" field is filterable. So I added a note to the guide about the compatibility.

I also renamed the environment variable to ENABLE_AGENTIC_RETRIEVAL_SOURCE_DATA, to match an upcoming change to the Search API that will enable this hydration via an API parameter. Once that feature is available in the Search API, then the code path introduced here can use that parameter instead of "in()" and this will become compatible with all indexes.

We can merge this now, however, and make that change once it's available in the API.

github-actions · 2025-09-03T06:42:41Z

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them.
For more details, check our Contributing Guide.

File Full Path Issues

README.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/3882102`	`31`
2	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w-azure-openai-and/3762087`	`277`
3	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure-ai-search/3956408`	`281`
4	`https://techcommunity.microsoft.com/blog/azuredevcommunityblog/rag-deep-dive-watch-all-the-recordings/4383171`	`283`

docs/deploy_features.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809`	`333`

docs/deploy_lowcost.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167`	`56`

docs/customization.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167`	`131`

docs/data_ingestion.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809`	`78`

docs/productionizing.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/3882102`	`95`

pamelafox · 2025-09-03T06:44:46Z

techcommunity is down right now, sadly, so those link errors can be ignored.

github-actions · 2025-09-03T06:46:41Z

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them.
For more details, check our Contributing Guide.

File Full Path Issues

README.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/3882102`	`31`
2	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w-azure-openai-and/3762087`	`277`
3	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure-ai-search/3956408`	`281`
4	`https://techcommunity.microsoft.com/blog/azuredevcommunityblog/rag-deep-dive-watch-all-the-recordings/4383171`	`283`

docs/deploy_features.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809`	`333`

docs/deploy_lowcost.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167`	`56`

docs/customization.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167`	`131`

docs/data_ingestion.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809`	`78`

docs/productionizing.md

#	Link	Line Number
1	`https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/3882102`	`95`

taylorn-ai · 2025-09-03T06:53:20Z

FYI the change to CONTRIBUTING.md happened automatically on commit. So I assumed it was a pre-commit hook?

mattgotteiner

Thank you for your contributions!

pamelafox · 2025-09-03T23:54:13Z

Merged! Thank you @taylorn-ai. We'll follow-up soon when the Azure AI Search SDK adds a boolean to include the full references.

taylorn-ai · 2025-09-10T03:16:18Z

FYI @pamelafox / @mattgotteiner the latest version (11.7.0b1 (2025-09-05)) of azure-search-documents has a plethora of changes, many of which break this feature, as well as most of how the Knowledge Agent works right now...

pamelafox · 2025-09-10T04:59:45Z

@taylorn-ai Yes, I've implemented the changes here:
#2723

The search API now supports hydration natively, with very negligible drawback in terms of performance, so we can just default to always hydrating (includ_reference_source_data).

I figured that we should bring in your change first, as I didn't know how long it'd take us to upgrade to the new version. Thanks for being on the lookout!

taylorn-ai · 2025-09-10T05:01:19Z

@pamelafox

Dang, I put in all that effort for nothing! 😀

pamelafox · 2025-09-10T05:06:42Z

@taylorn-ai It was a very well done PR! Now you can easily send more PRs for other fixes... 😉 And if anyone needs to stay on the old package/API for some reason, they can refer to your code. The new API is largely an improvement, but there are a few quirks currently, like no support for max_subqueries customization at query-time, only at agent creation time. I also had to re-implement the reranker threshold in the Python code, since that can only be set at agent creation time now.

taylorn-ai · 2025-09-10T05:20:14Z

That's a little annoying. The Lord giveth and the Lord taketh away 😃

Taylor added 4 commits August 22, 2025 11:12

Update coverage report generation command

fdeb1c5

- Fix command to generate HTML report for coverage using `diff-cover` 🛠️

Enhance agentic retrieval with optional field hydration

226b478

- ✨ Add support for enabling extra field hydration in agentic retrieval - 🔧 Update infrastructure to include new parameter for hydration - 📝 Modify documentation to reflect changes in usage instructions

taylorn-ai mentioned this pull request Aug 22, 2025

Azure AI Search Agentic Retrieval Returning More than Just Selected Fields #2569

Open

pamelafox requested a review from mattgotteiner August 23, 2025 15:00

Ran ruff and black on new tests

5dc2b4f

Ran black on changed files

422e5c3

Update test snapshots

8faa419

Taylor added 4 commits September 1, 2025 10:02

Merge branch 'main' into feature/hydrate-agent-results

f60f79f

Working on tests

b0a20d8

Merge branch 'main' into HEAD

9a74995

Merge commit '9a74995' into feature/hydrate-agent-results

381a141

Rename env var to match API parameter

a9b183f

Revert CONTRIBUTING.md TOC change as unneeded

4f87498

mattgotteiner approved these changes Sep 3, 2025

View reviewed changes

pamelafox merged commit e68c7e5 into Azure-Samples:main Sep 3, 2025
30 of 31 checks passed

feat: Add extra search index fields to Knowledge Agent response #2696

feat: Add extra search index fields to Knowledge Agent response #2696

Uh oh!

Conversation

taylorn-ai commented Aug 22, 2025

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

Uh oh!

taylorn-ai commented Aug 22, 2025

Uh oh!

pamelafox commented Aug 23, 2025

Uh oh!

taylorn-ai commented Aug 24, 2025

Uh oh!

taylorn-ai commented Aug 24, 2025

Uh oh!

taylorn-ai commented Aug 25, 2025

Uh oh!

pamelafox commented Aug 25, 2025

Uh oh!

taylorn-ai commented Aug 25, 2025

Uh oh!

taylorn-ai commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pamelafox commented Sep 1, 2025

Uh oh!

taylorn-ai commented Sep 1, 2025

Uh oh!

pamelafox commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Check Broken URLs

Uh oh!

pamelafox commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Check Broken URLs

Uh oh!

taylorn-ai commented Sep 3, 2025

Uh oh!

mattgotteiner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pamelafox commented Sep 3, 2025

Uh oh!

taylorn-ai commented Sep 10, 2025

Uh oh!

pamelafox commented Sep 10, 2025

Uh oh!

taylorn-ai commented Sep 10, 2025

Uh oh!

pamelafox commented Sep 10, 2025

Uh oh!

taylorn-ai commented Sep 10, 2025

Uh oh!

Uh oh!

taylorn-ai commented Sep 1, 2025 •

edited

Loading