Skip to content

Conversation

taylorn-ai
Copy link
Contributor

Purpose

As discussed in #2569, adds an optional “agentic reference hydration” feature to agentic retrieval so the app can fetch the full documents for the agent’s references (all index fields, not just the semantic fields). This enables richer source metadata (e.g., sourcefile, category, scores) and more accurate citations, while keeping the legacy behaviour as default.

Key points:

  • New env flag: ENABLE_AGENTIC_REF_HYDRATION (default false).
  • When enabled, references returned by the Knowledge Agent are hydrated via a follow-up Azure AI Search query using a composed search.in filter over unique doc_keys.
  • Preserves existing behaviour when disabled: builds Document objects from source_data embedded in references.
  • Injects the agent’s per-activity search query into each Document (via search_agent_query) for better observability.
  • Deterministic ordering:
    • Default: preserves agent order.
    • results_merge_strategy == "interleaved": sorts by reference.id (ascending) to interleave across activities.
  • Infra wiring: Bicep + parameters propagate the new flag to App Service env vars.
  • Documentation updated to describe the optional hydration behaviour and how to enable it.
  • Comprehensive tests added covering:
    • Hydrated vs non-hydrated paths
    • Interleaved sorting
    • Deduplication of duplicate doc_keys
    • Respecting top
    • Missing/empty doc_keys
    • Empty hydration results

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[x] No

Why:

  • The new behaviour is gated behind ENABLE_AGENTIC_REF_HYDRATION which defaults to false.
  • Constructors were extended with a defaulted hydrate_references: bool = False and all call sites updated accordingly.
  • Infra adds a new parameter with a default of false; existing environments deploy without change.

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[x] No

Type of change

[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[x] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

Taylor added 4 commits August 22, 2025 11:12
- Fix command to generate HTML report for coverage using `diff-cover` 🛠️
- Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable to control reference hydration behaviour 🌱
- Update Approach classes to accept hydrate_references parameter for managing reference hydration logic 🔧
- Modify document retrieval logic to hydrate references when enabled, improving data completeness 📄
- ✨ Add support for enabling extra field hydration in agentic retrieval
- 🔧 Update infrastructure to include new parameter for hydration
- 📝 Modify documentation to reflect changes in usage instructions
- 🎉 Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable for configuration
- 🧪 Implement mock search results for hydration testing in agentic retrieval
- 🔍 Create tests for agentic retrieval with and without hydration enabled
- 📜 Ensure hydrated results include additional fields from search results
@taylorn-ai
Copy link
Contributor Author

@microsoft-github-policy-service agree

@pamelafox
Copy link
Collaborator

Thank you @taylorn-ai for the great contribution! I would like @mattgotteiner from the AI Search team to review this, given his expertise with agentic retrieval.

@taylorn-ai
Copy link
Contributor Author

I ran black on the files I changed, do you want me to re-run on everything? Or do you want to do that?

@taylorn-ai
Copy link
Contributor Author

My bad, seems I didn't run black on the tests directory.

@taylorn-ai
Copy link
Contributor Author

Apologies, I thought I had updated the test snapshots. Also @pamelafox, FYI, the pre-commit hooks aren't working. I ran it manually and it changed some other files that I hadn't edited, so I left them as they were.

@pamelafox
Copy link
Collaborator

@taylorn-ai Interesting, sometimes that happens when the formatter changes its rules. I do run them locally but I might need to explicitly re-install the pre-commit hooks to see if formatter rules changes/

@taylorn-ai
Copy link
Contributor Author

Two of the tests failed due to a network issue, is that common? Seems odd. At least it's not my fault 😀

@taylorn-ai
Copy link
Contributor Author

taylorn-ai commented Sep 1, 2025

@pamelafox - I noticed some issues with my original tests, and once I changed them, things went downhill from there. I am pulling my hair out right now, are you able to assist with the tests please? Not my forte...

Edit: This is where I am at:

app/backend/app.py (100%)
app/backend/approaches/approach.py (97.6%): Missing lines 183
app/backend/approaches/chatreadretrieveread.py (100%)
app/backend/approaches/retrievethenread.py (100%)
tests/conftest.py (100%)
tests/mocks.py (94.7%): Missing lines 281
tests/test_agentic_retrieval.py (100%)
tests/test_chatapproach.py (100%)
-------------
Total:   181 lines
Missing: 2 lines
Coverage: 98%
-------------

Line 183 in approach.py is

self.hydrate_references = hydrate_references

And I don't understand how this is not being covered by tests.

And line 281 in mocks.py is completely unrelated to my changes, so not sure what's going on there...

- 🎨 Introduce `create_mock_retrieve` to parameterise mock retrieval responses.
- 🔄 Remove redundant mock search functions to streamline code.
- 🧪 Update tests to use the new mock retrieval function for various scenarios.
- 🧹 Clean up unused mock functions to enhance maintainability.
@pamelafox
Copy link
Collaborator

@taylorn-ai I'm happy to take a look when I'm back tomorrow - I imagine my recent merge may have added complexity. But I'm excited about your change, as I think I can get multimodal working with agentic retrieval once we've have these re-hydrated references!

@taylorn-ai
Copy link
Contributor Author

@taylorn-ai I'm happy to take a look when I'm back tomorrow - I imagine my recent merge may have added complexity. But I'm excited about your change, as I think I can get multimodal working with agentic retrieval once we've have these re-hydrated references!

I had a few conflicts when merging your changes, but I think I sorted them. Just a few strange issues.

@pamelafox
Copy link
Collaborator

@taylorn-ai I checked out the branch locally and gave it a try.

I realized it doesn't work with the default indexes as they do not have the "id" field set as filterable=True, and thats required for the in operator to work. I'm assuming your search index uses integrated vectorization, or that you set it up custom in some other way, such that the "id" field is filterable. So I added a note to the guide about the compatibility.

I also renamed the environment variable to ENABLE_AGENTIC_RETRIEVAL_SOURCE_DATA, to match an upcoming change to the Search API that will enable this hydration via an API parameter. Once that feature is available in the Search API, then the code path introduced here can use that parameter instead of "in()" and this will become compatible with all indexes.

We can merge this now, however, and make that change once it's available in the API.

Copy link

github-actions bot commented Sep 3, 2025

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/388210231
2https://techcommunity.microsoft.com/blog/azure-ai-services-blog/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w-azure-openai-and/3762087277
3https://techcommunity.microsoft.com/blog/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure-ai-search/3956408281
4https://techcommunity.microsoft.com/blog/azuredevcommunityblog/rag-deep-dive-watch-all-the-recordings/4383171283
docs/deploy_features.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809333
docs/deploy_lowcost.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/392916756
docs/customization.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167131
docs/data_ingestion.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/396080978
docs/productionizing.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/388210295

@pamelafox
Copy link
Collaborator

techcommunity is down right now, sadly, so those link errors can be ignored.

Copy link

github-actions bot commented Sep 3, 2025

Check Broken URLs

We have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue.

Check the file paths and associated broken URLs inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/388210231
2https://techcommunity.microsoft.com/blog/azure-ai-services-blog/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w-azure-openai-and/3762087277
3https://techcommunity.microsoft.com/blog/azure-ai-services-blog/access-control-in-generative-ai-applications-with-azure-ai-search/3956408281
4https://techcommunity.microsoft.com/blog/azuredevcommunityblog/rag-deep-dive-watch-all-the-recordings/4383171283
docs/deploy_features.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/3960809333
docs/deploy_lowcost.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/392916756
docs/customization.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid-retrieval-and-ranking-ca/3929167131
docs/data_ingestion.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azure-ai-services-blog/announcing-the-public-preview-of-integrated-vectorization-in-azure-ai-search/396080978
docs/productionizing.md
#LinkLine Number
1https://techcommunity.microsoft.com/blog/azurearchitectureblog/azure-openai-landing-zone-reference-architecture/388210295

@taylorn-ai
Copy link
Contributor Author

FYI the change to CONTRIBUTING.md happened automatically on commit. So I assumed it was a pre-commit hook?

Copy link
Collaborator

@mattgotteiner mattgotteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contributions!

@pamelafox pamelafox merged commit e68c7e5 into Azure-Samples:main Sep 3, 2025
30 of 31 checks passed
@pamelafox
Copy link
Collaborator

Merged! Thank you @taylorn-ai. We'll follow-up soon when the Azure AI Search SDK adds a boolean to include the full references.

@taylorn-ai
Copy link
Contributor Author

FYI @pamelafox / @mattgotteiner the latest version (11.7.0b1 (2025-09-05)) of azure-search-documents has a plethora of changes, many of which break this feature, as well as most of how the Knowledge Agent works right now...

@pamelafox
Copy link
Collaborator

@taylorn-ai Yes, I've implemented the changes here:
#2723

The search API now supports hydration natively, with very negligible drawback in terms of performance, so we can just default to always hydrating (includ_reference_source_data).

I figured that we should bring in your change first, as I didn't know how long it'd take us to upgrade to the new version. Thanks for being on the lookout!

@taylorn-ai
Copy link
Contributor Author

@pamelafox

Dang, I put in all that effort for nothing! 😀

@pamelafox
Copy link
Collaborator

@taylorn-ai It was a very well done PR! Now you can easily send more PRs for other fixes... 😉 And if anyone needs to stay on the old package/API for some reason, they can refer to your code. The new API is largely an improvement, but there are a few quirks currently, like no support for max_subqueries customization at query-time, only at agent creation time. I also had to re-implement the reranker threshold in the Python code, since that can only be set at agent creation time now.

@taylorn-ai
Copy link
Contributor Author

That's a little annoying. The Lord giveth and the Lord taketh away 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants