Skip to content

WIP: New approach to multimodal document ingestion #2558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from
Draft

Conversation

pamelafox
Copy link
Collaborator

Purpose

As I've discussed in various issues and live streams, our current "GPT vision approach" has some drawbacks, specifically:

  • requires vector embeddings for images, which increases ingestion time and RAG answering time
  • creates images of the entire document, which is unnecessary if the document is mostly text.

The new multimodal approach:

  • extracts images (using Document Intelligence) and stores them separately in Blob storage
  • optionally computes embeddings of extracted images
  • associates each text chunk with any nearby images
  • during the RAG flow, it optionally does a multivector search, and even if it doesnt, if it sees any images associated with the resulting chunks, it sends those to the model.

This is not yet complete, but I'm sharing the PR in early WIP form so that developers can see the direction and provide feedback.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[X] Yes - old approach will no longer be supported
[ ] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

Copy link

github-actions bot commented Jun 4, 2025

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
README.md
#LinkLine Number
1/docs/multimodal.md64
2docs/multimodal.md95
3docs/multimodal.md258

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
docs/deploy_features.md
#LinkLine Number
1./gpt4v.md138
2./gpt4v.md145
3./gpt4v.md262
4./gpt4v.md350
docs/productionizing.md
#LinkLine Number
1/docs/gpt4v.md109

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
docs/multimodal.md
#LinkLine Number
1./integrated_vectorization.md105

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
docs/multimodal.md
#LinkLine Number
1./integrated_vectorization.md105

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
docs/multimodal.md
#LinkLine Number
1./integrated_vectorization.md105

Copy link

Check Broken Paths

We have automatically detected the following broken relative paths in your files.
Review and fix the paths to resolve this issue.

Check the file paths and associated broken paths inside them.
For more details, check our Contributing Guide.

File Full Path Issues
docs/multimodal.md
#LinkLine Number
1./integrated_vectorization.md105

@pamelafox
Copy link
Collaborator Author

@mattgotteiner is still working on integrated vectorization, but I'm going to ask Copilot for a review on the rest of it.

@pamelafox pamelafox requested a review from Copilot July 17, 2025 18:18
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the test suite to align with the new multimodal document ingestion approach, adds tests for the new MultimodalModelDescriber, renames configuration flags, and expands blob manager coverage including ADLS support.

  • Added tests for MultimodalModelDescriber.describe_image and behavior on empty responses
  • Removed obsolete tests for fetch_image and updated blob manager tests for resource_group/subscription_id and image uploads
  • Updated app config tests to use the new showMultimodalOptions flag and refreshed many snapshot files to include image citations

Reviewed Changes

Copilot reviewed 160 out of 170 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/test_mediadescriber.py New tests for multimodal image describer, including mock OpenAI client
tests/test_fetch_image.py Removed old fetch_image tests
tests/test_blob_manager.py Updated constructor params, added ADLS upload/download tests
tests/test_app_config.py Renamed feature flag from showGPT4VOptions to showMultimodalOptions
tests/snapshots/**/* Snapshots updated to include new image citations and metadata
Comments suppressed due to low confidence (3)

tests/test_app_config.py:125

  • The configuration API has been updated to use showMultimodalOptions; ensure the backend implementation and any related documentation are updated accordingly to expose this new flag and remove references to the old showGPT4VOptions.
        assert result["showMultimodalOptions"] is False

tests/test_blob_manager.py:24

  • The BlobManager constructor parameters have been renamed to resource_group and subscription_id; ensure that the production code signature matches these updated names, otherwise this test will fail.
        resource_group=os.environ["AZURE_STORAGE_RESOURCE_GROUP"],

@@ -133,3 +139,115 @@ def mock_put(self, *args, **kwargs):
)
with pytest.raises(Exception):
await describer_bad_analyze.describe_image(b"imagebytes")


class MockAsyncOpenAI:
Copy link
Preview

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] MockAsyncOpenAI and MockChatCompletions helper classes are defined inline; extracting them into pytest fixtures or shared utilities would improve test maintainability and reduce duplication.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant