Skip to content

WIP: New approach to multimodal document ingestion #2558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 62 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
b55ca88
Prepare change for multimodal, rm old vision approach stuff
pamelafox May 28, 2025
74fdf48
Add LLM-based media describer
pamelafox May 29, 2025
001c86f
Prepdocs progress
pamelafox Jun 1, 2025
7c8f825
Fix media description with OpenAI
pamelafox Jun 2, 2025
ea3ee28
More prepdocs improvements for image handling
pamelafox Jun 3, 2025
16a0ec6
Merge branch 'main' into visionv2
pamelafox Jun 3, 2025
e85f8c5
Store bbox as list of pixel floats, add storage container just for ex…
pamelafox Jun 3, 2025
2a73065
Getting image citations almost working
pamelafox Jun 4, 2025
751abd1
More progress on multimodal approach
pamelafox Jun 27, 2025
ebfcfc5
Update more tests
pamelafox Jun 27, 2025
f177e5c
Fix up more app tests
pamelafox Jun 27, 2025
154f284
Add test for upload_document_image
pamelafox Jun 27, 2025
41aeac4
Add media describer and embeddings tests
pamelafox Jun 28, 2025
a2fa105
Fix tests for vision, work on vectorizer
pamelafox Jun 30, 2025
dae363f
Add font, rename multimodal doc
pamelafox Jun 30, 2025
7d576f0
Update links to multimodal
pamelafox Jun 30, 2025
0dfacbf
Fix import
pamelafox Jun 30, 2025
51fd298
Doc fixes
pamelafox Jun 30, 2025
d76e949
Fix f-string syntax
pamelafox Jun 30, 2025
9223611
Markdown lint issues
pamelafox Jun 30, 2025
c4086c2
mypy fixes and reasoning fixes
pamelafox Jul 1, 2025
e0a8843
Rename vision variables, fix mypy
pamelafox Jul 1, 2025
b470901
Mypy fixes
pamelafox Jul 1, 2025
b1e6225
Fix all mypy issues
pamelafox Jul 1, 2025
e21c264
Fixes to sidebar so that it all fits
pamelafox Jul 1, 2025
29c44c8
Fixes to sidebar so that it all fits
pamelafox Jul 1, 2025
806828e
Integrated vectorization and user upload work
pamelafox Jul 2, 2025
0d6e1ad
Progress on user upload support
pamelafox Jul 2, 2025
8a58ddf
changes needed for user upload
pamelafox Jul 3, 2025
8ed8a63
Update tests
pamelafox Jul 3, 2025
493ece4
Integrated vectorization progress
pamelafox Jul 3, 2025
7c37e40
Fix tests
pamelafox Jul 3, 2025
78383ec
Use ImageEmbeddings client directly
pamelafox Jul 7, 2025
8c17ca5
Change frontend for vector fields
pamelafox Jul 7, 2025
cd065c1
Use boolean parameters in the backend as well, for vector fields
pamelafox Jul 7, 2025
75c3a0f
Updated translations
pamelafox Jul 7, 2025
916278e
Change frontend for LLM inputs
pamelafox Jul 7, 2025
5b17932
Change from LLM inputs to booleans
pamelafox Jul 7, 2025
43c9eac
Working on tests
pamelafox Jul 8, 2025
2ee850f
Blob manager improvements/tests
pamelafox Jul 8, 2025
13e85ee
Change to a global client that we close in lifespan
pamelafox Jul 8, 2025
e074113
Add latest int vect changes
pamelafox Jul 9, 2025
61f061a
Update the tests
pamelafox Jul 14, 2025
47d7308
Add as_bytes option
pamelafox Jul 15, 2025
c803bfa
Mypy fixes
pamelafox Jul 15, 2025
bab4350
Mypy fixes
pamelafox Jul 15, 2025
783f61e
More mypy fixes
pamelafox Jul 15, 2025
f34b09e
More mypy fixes
pamelafox Jul 15, 2025
4592837
Merge branch 'main' into visionv2
pamelafox Jul 15, 2025
659d401
Address more TODOs
pamelafox Jul 15, 2025
0a097df
Fix E2E tests
pamelafox Jul 15, 2025
952fd44
Add more tests for blobmanger
pamelafox Jul 15, 2025
74c3421
Markdown fix, more coverage
pamelafox Jul 15, 2025
da270e5
Fix broken MD link
pamelafox Jul 15, 2025
874082b
Increase coverage
pamelafox Jul 15, 2025
6acc94a
Increase test coverage
pamelafox Jul 16, 2025
8dc042e
Add diff-cover step to python test
pamelafox Jul 16, 2025
f5afae7
Fix diff-cover action
pamelafox Jul 16, 2025
10124eb
Fetch origin main for diff-cover
pamelafox Jul 16, 2025
9c11ee6
Increase test coverage
pamelafox Jul 16, 2025
723be32
More tests, Windows check
pamelafox Jul 16, 2025
ed5cf0a
Better copilot instructions
pamelafox Jul 22, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 9 additions & 10 deletions .azdo/pipelines/azure-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,6 @@ steps:
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION: $(AZURE_OPENAI_EMB_DEPLOYMENT_VERSION)
AZURE_OPENAI_EMB_DEPLOYMENT_SKU: $(AZURE_OPENAI_EMB_DEPLOYMENT_SKU)
AZURE_OPENAI_EMB_DIMENSIONS: $(AZURE_OPENAI_EMB_DIMENSIONS)
AZURE_OPENAI_GPT4V_MODEL: $(AZURE_OPENAI_GPT4V_MODEL)
AZURE_OPENAI_GPT4V_DEPLOYMENT: $(AZURE_OPENAI_GPT4V_DEPLOYMENT)
AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY: $(AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY)
AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION: $(AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION)
AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU: $(AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU)
AZURE_OPENAI_DISABLE_KEYS: $(AZURE_OPENAI_DISABLE_KEYS)
OPENAI_HOST: $(OPENAI_HOST)
OPENAI_API_KEY: $(OPENAI_API_KEY)
Expand All @@ -91,13 +86,13 @@ steps:
AZURE_APPLICATION_INSIGHTS_DASHBOARD: $(AZURE_APPLICATION_INSIGHTS_DASHBOARD)
AZURE_LOG_ANALYTICS: $(AZURE_LOG_ANALYTICS)
USE_VECTORS: $(USE_VECTORS)
USE_GPT4V: $(USE_GPT4V)
USE_MULTIMODAL: $(USE_MULTIMODAL)
AZURE_VISION_ENDPOINT: $(AZURE_VISION_ENDPOINT)
VISION_SECRET_NAME: $(VISION_SECRET_NAME)
AZURE_COMPUTER_VISION_SERVICE: $(AZURE_COMPUTER_VISION_SERVICE)
AZURE_COMPUTER_VISION_RESOURCE_GROUP: $(AZURE_COMPUTER_VISION_RESOURCE_GROUP)
AZURE_COMPUTER_VISION_LOCATION: $(AZURE_COMPUTER_VISION_LOCATION)
AZURE_COMPUTER_VISION_SKU: $(AZURE_COMPUTER_VISION_SKU)
AZURE_VISION_SERVICE: $(AZURE_VISION_SERVICE)
AZURE_VISION_RESOURCE_GROUP: $(AZURE_VISION_RESOURCE_GROUP)
AZURE_VISION_LOCATION: $(AZURE_VISION_LOCATION)
AZURE_VISION_SKU: $(AZURE_VISION_SKU)
ENABLE_LANGUAGE_PICKER: $(ENABLE_LANGUAGE_PICKER)
USE_SPEECH_INPUT_BROWSER: $(USE_SPEECH_INPUT_BROWSER)
USE_SPEECH_OUTPUT_BROWSER: $(USE_SPEECH_OUTPUT_BROWSER)
Expand Down Expand Up @@ -126,6 +121,10 @@ steps:
AZURE_CONTAINER_APPS_WORKLOAD_PROFILE: $(AZURE_CONTAINER_APPS_WORKLOAD_PROFILE)
USE_CHAT_HISTORY_BROWSER: $(USE_CHAT_HISTORY_BROWSER)
USE_MEDIA_DESCRIBER_AZURE_CU: $(USE_MEDIA_DESCRIBER_AZURE_CU)
RAG_SEARCH_TEXT_EMBEDDINGS: $(RAG_SEARCH_TEXT_EMBEDDINGS)
RAG_SEARCH_IMAGE_EMBEDDINGS: $(RAG_SEARCH_IMAGE_EMBEDDINGS)
RAG_SEND_TEXT_SOURCES: $(RAG_SEND_TEXT_SOURCES)
RAG_SEND_IMAGE_SOURCES: $(RAG_SEND_IMAGE_SOURCES)
- task: AzureCLI@2
displayName: Deploy Application
inputs:
Expand Down
3 changes: 2 additions & 1 deletion .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@
"ms-azuretools.azure-dev",
"ms-azuretools.vscode-bicep",
"ms-python.python",
"esbenp.prettier-vscode"
"esbenp.prettier-vscode",
"DavidAnson.vscode-markdownlint"
]
}
},
Expand Down
81 changes: 81 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Adding new data

New files should be added to the `data` folder, and then either run scripts/prepdocs.sh or script/prepdocs.ps1 to ingest the data.

# Overall code layout

* app: Contains the main application code, including frontend and backend.
* app/backend: Contains the Python backend code, written with Quart framework.
* app/backend/approaches: Contains the different approaches
* app/backend/approaches/approach.py: Base class for all approaches
* app/backend/approaches/retrievethenread.py: Ask approach, just searches and answers
* app/backend/approaches/chatreadretrieveread.py: Chat approach, includes query rewriting step first
* app/backend/approaches/prompts/ask_answer_question.prompty: Prompt used by the Ask approach to answer the question based off sources
* app/backend/approaches/prompts/chat_query_rewrite.prompty: Prompt used to rewrite the query based off search history into a better search query
* app/backend/approaches/prompts/chat_query_rewrite_tools.json: Tools used by the query rewriting prompt
* app/backend/approaches/prompts/chat_answer_question.prompty: Prompt used by the Chat approach to actually answer the question based off sources
* app/backend/app.py: The main entry point for the backend application.
* app/frontend: Contains the React frontend code, built with TypeScript, built with vite.
* app/frontend/src/api: Contains the API client code for communicating with the backend.
* app/frontend/src/components: Contains the React components for the frontend.
* app/frontend/src/locales: Contains the translation files for internationalization.
* app/frontend/src/locales/da/translation.json: Danish translations
* app/frontend/src/locales/en/translation.json: English translations
* app/frontend/src/locales/es/translation.json: Spanish translations
* app/frontend/src/locales/fr/translation.json: French translations
* app/frontend/src/locales/it/translation.json: Italian translations
* app/frontend/src/locales/ja/translation.json: Japanese translations
* app/frontend/src/locales/nl/translation.json: Dutch translations
* app/frontend/src/locales/ptBR/translation.json: Portuguese translations
* app/frontend/src/locales/tr/translation.json: Turkish translations
* app/frontend/src/pages: Contains the main pages of the application
* infra: Contains the Bicep templates for provisioning Azure resources.
* tests: Contains the test code, including e2e tests, app integration tests, and unit tests.

# Adding a new azd environment variable

An azd environment variable is stored by the azd CLI for each environment. It is passed to the "azd up" command and can configure both provisioning options and application settings.
When adding new azd environment variables, update:

1. infra/main.parameters.json : Add the new parameter with a Bicep-friendly variable name and map to the new environment variable
1. infra/main.bicep: Add the new Bicep parameter at the top, and add it to the `appEnvVariables` object
1. azure.yaml: Add the new environment variable under pipeline config section
1. .azdo/pipelines/azure-dev.yml: Add the new environment variable under `env` section
1. .github/workflows/azure-dev.yml: Add the new environment variable under `env` section

# Adding a new setting to "Developer Settings" in RAG app

When adding a new developer setting, update:

* frontend:
* app/frontend/src/api/models.ts : Add to ChatAppRequestOverrides
* app/frontend/src/components/Settings.tsx : Add a UI element for the setting
* app/frontend/src/locales/*/translations.json: Add a translation for the setting label/tooltip for all languages
* app/frontend/src/pages/chat/Chat.tsx: Add the setting to the component, pass it to Settings
* app/frontend/src/pages/ask/Ask.tsx: Add the setting to the component, pass it to Settings

* backend:
* app/backend/approaches/chatreadretrieveread.py : Retrieve from overrides parameter
* app/backend/approaches/retrievethenread.py : Retrieve from overrides parameter
* app/backend/app.py: Some settings may need to sent down in the /config route.

# When adding tests for a new feature:

All tests are in the `tests` folder and use the pytest framework.
There are three styles of tests:

* e2e tests: These use playwright to run the app in a browser and test the UI end-to-end. They are in e2e.py and they mock the backend using the snapshots from the app tests.
* app integration tests: Mostly in test_app.py, these test the app's API endpoints and use mocks for services like Azure OpenAI and Azure Search.
* unit tests: The rest of the tests are unit tests that test individual functions and methods. They are in test_*.py files.

When adding a new feature, add tests for it in the appropriate file.
If the feature is a UI element, add an e2e test for it.
If it is an API endpoint, add an app integration test for it.
If it is a function or method, add a unit test for it.
Use mocks from conftest.py to mock external services.

When you're running tests, make sure you activate the .venv virtual environment first:

```bash
source .venv/bin/activate
```
19 changes: 9 additions & 10 deletions .github/workflows/azure-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,10 @@ jobs:
AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP }}
AZURE_DOCUMENTINTELLIGENCE_SKU: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_SKU }}
AZURE_DOCUMENTINTELLIGENCE_LOCATION: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_LOCATION }}
AZURE_COMPUTER_VISION_SERVICE: ${{ vars.AZURE_COMPUTER_VISION_SERVICE }}
AZURE_COMPUTER_VISION_RESOURCE_GROUP: ${{ vars.AZURE_COMPUTER_VISION_RESOURCE_GROUP }}
AZURE_COMPUTER_VISION_LOCATION: ${{ vars.AZURE_COMPUTER_VISION_LOCATION }}
AZURE_COMPUTER_VISION_SKU: ${{ vars.AZURE_COMPUTER_VISION_SKU }}
AZURE_VISION_SERVICE: ${{ vars.AZURE_VISION_SERVICE }}
AZURE_VISION_RESOURCE_GROUP: ${{ vars.AZURE_VISION_RESOURCE_GROUP }}
AZURE_VISION_LOCATION: ${{ vars.AZURE_VISION_LOCATION }}
AZURE_VISION_SKU: ${{ vars.AZURE_VISION_SKU }}
AZURE_SEARCH_INDEX: ${{ vars.AZURE_SEARCH_INDEX }}
AZURE_SEARCH_SERVICE: ${{ vars.AZURE_SEARCH_SERVICE }}
AZURE_SEARCH_SERVICE_RESOURCE_GROUP: ${{ vars.AZURE_SEARCH_SERVICE_RESOURCE_GROUP }}
Expand All @@ -67,11 +67,6 @@ jobs:
AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY }}
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_VERSION }}
AZURE_OPENAI_EMB_DIMENSIONS: ${{ vars.AZURE_OPENAI_EMB_DIMENSIONS }}
AZURE_OPENAI_GPT4V_MODEL: ${{ vars.AZURE_OPENAI_GPT4V_MODEL }}
AZURE_OPENAI_GPT4V_DEPLOYMENT: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU }}
USE_EVAL: ${{ vars.USE_EVAL }}
AZURE_OPENAI_EVAL_MODEL: ${{ vars.AZURE_OPENAI_EVAL_MODEL }}
AZURE_OPENAI_EVAL_MODEL_VERSION: ${{ vars.AZURE_OPENAI_EVAL_MODEL_VERSION }}
Expand All @@ -87,7 +82,7 @@ jobs:
AZURE_APPLICATION_INSIGHTS_DASHBOARD: ${{ vars.AZURE_APPLICATION_INSIGHTS_DASHBOARD }}
AZURE_LOG_ANALYTICS: ${{ vars.AZURE_LOG_ANALYTICS }}
USE_VECTORS: ${{ vars.USE_VECTORS }}
USE_GPT4V: ${{ vars.USE_GPT4V }}
USE_MULTIMODAL: ${{ vars.USE_MULTIMODAL }}
AZURE_VISION_ENDPOINT: ${{ vars.AZURE_VISION_ENDPOINT }}
VISION_SECRET_NAME: ${{ vars.VISION_SECRET_NAME }}
ENABLE_LANGUAGE_PICKER: ${{ vars.ENABLE_LANGUAGE_PICKER }}
Expand Down Expand Up @@ -116,6 +111,10 @@ jobs:
USE_CHAT_HISTORY_BROWSER: ${{ vars.USE_CHAT_HISTORY_BROWSER }}
USE_MEDIA_DESCRIBER_AZURE_CU: ${{ vars.USE_MEDIA_DESCRIBER_AZURE_CU }}
USE_AI_PROJECT: ${{ vars.USE_AI_PROJECT }}
RAG_SEARCH_TEXT_EMBEDDINGS: ${{ vars.RAG_SEARCH_TEXT_EMBEDDINGS }}
RAG_SEARCH_IMAGE_EMBEDDINGS: ${{ vars.RAG_SEARCH_IMAGE_EMBEDDINGS }}
RAG_SEND_TEXT_SOURCES: ${{ vars.RAG_SEND_TEXT_SOURCES }}
RAG_SEND_IMAGE_SOURCES: ${{ vars.RAG_SEND_IMAGE_SOURCES }}
steps:
- name: Checkout
uses: actions/checkout@v4
Expand Down
15 changes: 5 additions & 10 deletions .github/workflows/evaluate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@ jobs:
AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP }}
AZURE_DOCUMENTINTELLIGENCE_SKU: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_SKU }}
AZURE_DOCUMENTINTELLIGENCE_LOCATION: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_LOCATION }}
AZURE_COMPUTER_VISION_SERVICE: ${{ vars.AZURE_COMPUTER_VISION_SERVICE }}
AZURE_COMPUTER_VISION_RESOURCE_GROUP: ${{ vars.AZURE_COMPUTER_VISION_RESOURCE_GROUP }}
AZURE_COMPUTER_VISION_LOCATION: ${{ vars.AZURE_COMPUTER_VISION_LOCATION }}
AZURE_COMPUTER_VISION_SKU: ${{ vars.AZURE_COMPUTER_VISION_SKU }}
AZURE_VISION_SERVICE: ${{ vars.AZURE_VISION_SERVICE }}
AZURE_VISION_RESOURCE_GROUP: ${{ vars.AZURE_VISION_RESOURCE_GROUP }}
AZURE_VISION_LOCATION: ${{ vars.AZURE_VISION_LOCATION }}
AZURE_VISION_SKU: ${{ vars.AZURE_VISION_SKU }}
AZURE_SEARCH_INDEX: ${{ vars.AZURE_SEARCH_INDEX }}
AZURE_SEARCH_SERVICE: ${{ vars.AZURE_SEARCH_SERVICE }}
AZURE_SEARCH_SERVICE_RESOURCE_GROUP: ${{ vars.AZURE_SEARCH_SERVICE_RESOURCE_GROUP }}
Expand All @@ -62,11 +62,6 @@ jobs:
AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY }}
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_VERSION }}
AZURE_OPENAI_EMB_DIMENSIONS: ${{ vars.AZURE_OPENAI_EMB_DIMENSIONS }}
AZURE_OPENAI_GPT4V_MODEL: ${{ vars.AZURE_OPENAI_GPT4V_MODEL }}
AZURE_OPENAI_GPT4V_DEPLOYMENT: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION }}
AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU }}
USE_EVAL: ${{ vars.USE_EVAL }}
AZURE_OPENAI_EVAL_MODEL: ${{ vars.AZURE_OPENAI_EVAL_MODEL }}
AZURE_OPENAI_EVAL_MODEL_VERSION: ${{ vars.AZURE_OPENAI_EVAL_MODEL_VERSION }}
Expand All @@ -82,7 +77,7 @@ jobs:
AZURE_APPLICATION_INSIGHTS_DASHBOARD: ${{ vars.AZURE_APPLICATION_INSIGHTS_DASHBOARD }}
AZURE_LOG_ANALYTICS: ${{ vars.AZURE_LOG_ANALYTICS }}
USE_VECTORS: ${{ vars.USE_VECTORS }}
USE_GPT4V: ${{ vars.USE_GPT4V }}
USE_MULTIMODAL: ${{ vars.USE_MULTIMODAL }}
AZURE_VISION_ENDPOINT: ${{ vars.AZURE_VISION_ENDPOINT }}
VISION_SECRET_NAME: ${{ vars.VISION_SECRET_NAME }}
ENABLE_LANGUAGE_PICKER: ${{ vars.ENABLE_LANGUAGE_PICKER }}
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/python-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ jobs:
node_version: ["20", "22"]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Fetch all history for diff-cover
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
Expand Down Expand Up @@ -61,7 +63,12 @@ jobs:
run: black . --check --verbose
- name: Run Python tests
if: runner.os != 'Windows'
run: pytest -s -vv --cov --cov-fail-under=89
run: pytest -s -vv --cov --cov-report=xml --cov-fail-under=90
- name: Check diff coverage
if: runner.os != 'Windows'
run: |
git fetch origin main:refs/remotes/origin/main
diff-cover coverage.xml --compare-branch=origin/main --fail-under=90
- name: Run E2E tests with Playwright
id: e2e
if: runner.os != 'Windows'
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ coverage.xml
.hypothesis/
.pytest_cache/
cover/
coverage_report.html

# Translations
*.mo
Expand Down
3 changes: 2 additions & 1 deletion .vscode/extensions.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"ms-azuretools.azure-dev",
"ms-azuretools.vscode-bicep",
"ms-python.python",
"esbenp.prettier-vscode"
"esbenp.prettier-vscode",
"DavidAnson.vscode-markdownlint",
]
}
29 changes: 23 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@ contact [[email protected]](mailto:[email protected]) with any additio
- [Running unit tests](#running-unit-tests)
- [Running E2E tests](#running-e2e-tests)
- [Code style](#code-style)
- [Adding new azd environment variables](#adding-new-azd-environment-variables)
- [Adding new UI strings](#adding-new-ui-strings)
- [Adding new features](#adding-new-features)
- [Adding new azd environment variables](#adding-new-azd-environment-variables)
- [Adding new UI strings](#adding-new-ui-strings)

## Submitting a Pull Request (PR)

Expand Down Expand Up @@ -62,10 +63,18 @@ Run the tests:
python -m pytest
```

Check the coverage report to make sure your changes are covered.
If test snapshots need updating (and the changes are expected), you can update them by running:

```shell
python -m pytest --cov
python -m pytest --snapshot-update
```

Once tests are passing, generate a coverage report to make sure your changes are covered:

```shell
pytest --cov --cov-report=xml && \
diff-cover coverage.xml --format html:coverage_report.html && \
open coverage_report.html
```

## Running E2E tests
Expand Down Expand Up @@ -118,7 +127,15 @@ python -m black <path-to-file>

If you followed the steps above to install the pre-commit hooks, then you can just wait for those hooks to run `ruff` and `black` for you.

## Adding new azd environment variables
## Adding new features

We recommend using GitHub Copilot Agent mode when adding new features,
as this project includes [.github/copilot-instructions.md](.github/copilot-instructions.md) file
that instructs Copilot on how to generate code for common code changes.

If you are not using Copilot Agent mode, consult both that file and suggestions below.

### Adding new azd environment variables

When adding new azd environment variables, please remember to update:

Expand All @@ -128,7 +145,7 @@ When adding new azd environment variables, please remember to update:
1. [ADO pipeline](.azdo/pipelines/azure-dev.yml).
1. [Github workflows](.github/workflows/azure-dev.yml)

## Adding new UI strings
### Adding new UI strings

When adding new UI strings, please remember to update all translations.
For any translations that you generate with an AI tool,
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ The repo includes sample data so it's ready to try end to end. In this sample ap
- Renders citations and thought process for each answer
- Includes settings directly in the UI to tweak the behavior and experiment with options
- Integrates Azure AI Search for indexing and retrieval of documents, with support for [many document formats](/docs/data_ingestion.md#supported-document-formats) as well as [integrated vectorization](/docs/data_ingestion.md#overview-of-integrated-vectorization)
- Optional usage of [GPT-4 with vision](/docs/gpt4v.md) to reason over image-heavy documents
- Optional usage of [multimodal models](/docs/multimodal.md) to reason over image-heavy documents
- Optional addition of [speech input/output](/docs/deploy_features.md#enabling-speech-inputoutput) for accessibility
- Optional automation of [user login and data access](/docs/login_and_acl.md) via Microsoft Entra
- Performance tracing and monitoring with Application Insights
Expand Down Expand Up @@ -92,7 +92,7 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/e3490de2
- Azure AI Search: Basic tier, 1 replica, free level of semantic search. Pricing per hour. [Pricing](https://azure.microsoft.com/pricing/details/search/)
- Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
- Azure Cosmos DB: Only provisioned if you enabled [chat history with Cosmos DB](docs/deploy_features.md#enabling-persistent-chat-history-with-azure-cosmos-db). Serverless tier. Pricing per request unit and storage. [Pricing](https://azure.microsoft.com/pricing/details/cosmos-db/)
- Azure AI Vision: Only provisioned if you enabled [GPT-4 with vision](docs/gpt4v.md). Pricing per 1K transactions. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/)
- Azure AI Vision: Only provisioned if you enabled [multimodal approach](docs/multimodal.md). Pricing per 1K transactions. [Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/computer-vision/)
- Azure AI Content Understanding: Only provisioned if you enabled [media description](docs/deploy_features.md#enabling-media-description-with-azure-content-understanding). Pricing per 1K images. [Pricing](https://azure.microsoft.com/pricing/details/content-understanding/)
- Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)

Expand Down Expand Up @@ -255,7 +255,7 @@ You can find extensive documentation in the [docs](docs/README.md) folder:
- [Enabling optional features](docs/deploy_features.md)
- [All features](docs/deploy_features.md)
- [Login and access control](docs/login_and_acl.md)
- [GPT-4 Turbo with Vision](docs/gpt4v.md)
- [Multimodal](docs/multimodal.md)
- [Reasoning](docs/reasoning.md)
- [Private endpoints](docs/deploy_private.md)
- [Agentic retrieval](docs/agentic_retrieval.md)
Expand Down
Loading
Loading