-
Notifications
You must be signed in to change notification settings - Fork 5k
feat: Add extra search index fields to Knowledge Agent response #2696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add extra search index fields to Knowledge Agent response #2696
Conversation
- Fix command to generate HTML report for coverage using `diff-cover` 🛠️
- Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable to control reference hydration behaviour 🌱 - Update Approach classes to accept hydrate_references parameter for managing reference hydration logic 🔧 - Modify document retrieval logic to hydrate references when enabled, improving data completeness 📄
- ✨ Add support for enabling extra field hydration in agentic retrieval - 🔧 Update infrastructure to include new parameter for hydration - 📝 Modify documentation to reflect changes in usage instructions
- 🎉 Introduce ENABLE_AGENTIC_REF_HYDRATION environment variable for configuration - 🧪 Implement mock search results for hydration testing in agentic retrieval - 🔍 Create tests for agentic retrieval with and without hydration enabled - 📜 Ensure hydrated results include additional fields from search results
@microsoft-github-policy-service agree |
Thank you @taylorn-ai for the great contribution! I would like @mattgotteiner from the AI Search team to review this, given his expertise with agentic retrieval. |
I ran black on the files I changed, do you want me to re-run on everything? Or do you want to do that? |
My bad, seems I didn't run |
Apologies, I thought I had updated the test snapshots. Also @pamelafox, FYI, the |
@taylorn-ai Interesting, sometimes that happens when the formatter changes its rules. I do run them locally but I might need to explicitly re-install the pre-commit hooks to see if formatter rules changes/ |
Two of the tests failed due to a network issue, is that common? Seems odd. At least it's not my fault 😀 |
@pamelafox - I noticed some issues with my original tests, and once I changed them, things went downhill from there. I am pulling my hair out right now, are you able to assist with the tests please? Not my forte... Edit: This is where I am at: app/backend/app.py (100%)
app/backend/approaches/approach.py (97.6%): Missing lines 183
app/backend/approaches/chatreadretrieveread.py (100%)
app/backend/approaches/retrievethenread.py (100%)
tests/conftest.py (100%)
tests/mocks.py (94.7%): Missing lines 281
tests/test_agentic_retrieval.py (100%)
tests/test_chatapproach.py (100%)
-------------
Total: 181 lines
Missing: 2 lines
Coverage: 98%
------------- Line 183 in
And I don't understand how this is not being covered by tests. And line 281 in |
- 🎨 Introduce `create_mock_retrieve` to parameterise mock retrieval responses. - 🔄 Remove redundant mock search functions to streamline code. - 🧪 Update tests to use the new mock retrieval function for various scenarios. - 🧹 Clean up unused mock functions to enhance maintainability.
@taylorn-ai I'm happy to take a look when I'm back tomorrow - I imagine my recent merge may have added complexity. But I'm excited about your change, as I think I can get multimodal working with agentic retrieval once we've have these re-hydrated references! |
I had a few conflicts when merging your changes, but I think I sorted them. Just a few strange issues. |
@taylorn-ai I checked out the branch locally and gave it a try. I realized it doesn't work with the default indexes as they do not have the "id" field set as filterable=True, and thats required for the in operator to work. I'm assuming your search index uses integrated vectorization, or that you set it up custom in some other way, such that the "id" field is filterable. So I added a note to the guide about the compatibility. I also renamed the environment variable to ENABLE_AGENTIC_RETRIEVAL_SOURCE_DATA, to match an upcoming change to the Search API that will enable this hydration via an API parameter. Once that feature is available in the Search API, then the code path introduced here can use that parameter instead of "in()" and this will become compatible with all indexes. We can merge this now, however, and make that change once it's available in the API. |
Check Broken URLsWe have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue. Check the file paths and associated broken URLs inside them.
|
techcommunity is down right now, sadly, so those link errors can be ignored. |
Check Broken URLsWe have automatically detected the following broken URLs in your files. Review and fix the paths to resolve this issue. Check the file paths and associated broken URLs inside them.
|
FYI the change to CONTRIBUTING.md happened automatically on commit. So I assumed it was a pre-commit hook? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contributions!
Merged! Thank you @taylorn-ai. We'll follow-up soon when the Azure AI Search SDK adds a boolean to include the full references. |
FYI @pamelafox / @mattgotteiner the latest version (11.7.0b1 (2025-09-05)) of azure-search-documents has a plethora of changes, many of which break this feature, as well as most of how the Knowledge Agent works right now... |
@taylorn-ai Yes, I've implemented the changes here: The search API now supports hydration natively, with very negligible drawback in terms of performance, so we can just default to always hydrating (includ_reference_source_data). I figured that we should bring in your change first, as I didn't know how long it'd take us to upgrade to the new version. Thanks for being on the lookout! |
Dang, I put in all that effort for nothing! 😀 |
@taylorn-ai It was a very well done PR! Now you can easily send more PRs for other fixes... 😉 And if anyone needs to stay on the old package/API for some reason, they can refer to your code. The new API is largely an improvement, but there are a few quirks currently, like no support for max_subqueries customization at query-time, only at agent creation time. I also had to re-implement the reranker threshold in the Python code, since that can only be set at agent creation time now. |
That's a little annoying. The Lord giveth and the Lord taketh away 😃 |
Purpose
As discussed in #2569, adds an optional “agentic reference hydration” feature to agentic retrieval so the app can fetch the full documents for the agent’s references (all index fields, not just the semantic fields). This enables richer source metadata (e.g., sourcefile, category, scores) and more accurate citations, while keeping the legacy behaviour as default.
Key points:
ENABLE_AGENTIC_REF_HYDRATION
(default false).search.in
filter over uniquedoc_key
s.Document
objects fromsource_data
embedded in references.Document
(viasearch_agent_query
) for better observability.results_merge_strategy == "interleaved"
: sorts byreference.id
(ascending) to interleave across activities.doc_key
stop
doc_key
sDoes this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Why:
ENABLE_AGENTIC_REF_HYDRATION
which defaults tofalse
.hydrate_references: bool = False
and all call sites updated accordingly.false
; existing environments deploy without change.Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest
).python -m pytest --cov
to verify 100% coverage of added linespython -m mypy
to check for type errorsruff
andblack
manually on my code.