Skip to content

Commit 74c3421

Browse files
committed
Markdown fix, more coverage
1 parent 952fd44 commit 74c3421

File tree

5 files changed

+97
-50
lines changed

5 files changed

+97
-50
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -72,8 +72,8 @@ python -m pytest --snapshot-update
7272
Once tests are passing, generate a coverage report to make sure your changes are covered:
7373

7474
```shell
75-
pytest --cov --cov-report=xml
76-
diff-cover coverage.xml --format html:coverage_report.html
75+
pytest --cov --cov-report=xml && \
76+
diff-cover coverage.xml --format html:coverage_report.html && \
7777
open coverage_report.html
7878
```
7979

docs/customization.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,9 @@ This guide provides more details for customizing the RAG chat app.
99
- [Using your own data](#using-your-own-data)
1010
- [Customizing the UI](#customizing-the-ui)
1111
- [Customizing the backend](#customizing-the-backend)
12-
- [Chat/Ask tabs](#chatask-tabs)
12+
- [Chat/Ask approaches](#chatask-approaches)
1313
- [Chat approach](#chat-approach)
14-
- [Chat with vision](#chat-with-vision)
15-
- [Ask tab](#ask-tab)
16-
- [Ask with vision](#ask-with-vision)
14+
- [Ask approach](#ask-approach)
1715
- [Improving answer quality](#improving-answer-quality)
1816
- [Identify the problem point](#identify-the-problem-point)
1917
- [Improving OpenAI ChatCompletion results](#improving-openai-chatcompletion-results)
@@ -32,7 +30,7 @@ The frontend is built using [React](https://reactjs.org/) and [Fluent UI compone
3230

3331
The backend is built using [Quart](https://quart.palletsprojects.com/), a Python framework for asynchronous web applications. The backend code is stored in the `app/backend` folder. The frontend and backend communicate over HTTP using JSON or streamed NDJSON responses. Learn more in the [HTTP Protocol guide](http_protocol.md).
3432

35-
### Chat/Ask tabs
33+
### Chat/Ask approaches
3634

3735
Typically, the primary backend code you'll want to customize is the `app/backend/approaches` folder, which contains the classes powering the Chat and Ask tabs. Each class uses a different RAG (Retrieval Augmented Generation) approach, which include system messages that should be changed to match your data
3836

@@ -55,7 +53,7 @@ there are several differences in the chat approach:
5553
2. **Search**: For this step, it also calculates a vector embedding for the user question using [the Azure AI Vision vectorize text API](https://learn.microsoft.com/azure/ai-services/computer-vision/how-to/image-retrieval#call-the-vectorize-text-api), and passes that to the Azure AI Search to compare against the image embedding fields in the indexed documents. For each matching document, it downloads each associated image from Azure Blob Storage and converts it to a base 64 encoding.
5654
3. **Answering**: When it combines the search results and user question, it includes the base 64 encoded images, and sends along both the text and images to the multimodal LLM. The model generates a response that includes citations to the images, and the UI renders the images when a citation is clicked.
5755

58-
#### Ask tab
56+
#### Ask approach
5957

6058
The ask tab uses the approach programmed in [retrievethenread.py](https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/approaches/retrievethenread.py).
6159

tests/conftest.py

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1083,28 +1083,6 @@ def mock_readinto(self, stream: IO[bytes]):
10831083
monkeypatch.setattr(azure.storage.filedatalake.aio.StorageStreamDownloader, "readinto", mock_readinto)
10841084

10851085

1086-
# Add a mock token_provider for tests
1087-
@pytest.fixture
1088-
def mock_token_provider():
1089-
async def dummy_token_provider():
1090-
return "dummy_token"
1091-
1092-
return dummy_token_provider
1093-
1094-
1095-
@pytest.fixture(autouse=True)
1096-
def patch_get_bearer_token_provider(monkeypatch, mock_token_provider):
1097-
"""
1098-
Patch the get_bearer_token_provider function used in app.py to return our mock_token_provider.
1099-
This is automatically applied to all tests.
1100-
"""
1101-
1102-
def mock_get_bearer_token(*args, **kwargs):
1103-
return mock_token_provider
1104-
1105-
monkeypatch.setattr("azure.identity.aio.get_bearer_token_provider", mock_get_bearer_token)
1106-
1107-
11081086
@pytest.fixture
11091087
def chat_approach():
11101088
return ChatReadRetrieveReadApproach(

tests/mocks.py

Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
import openai.types
88
from azure.cognitiveservices.speech import ResultReason
99
from azure.core.credentials_async import AsyncTokenCredential
10-
from azure.core.exceptions import ResourceNotFoundError
1110
from azure.core.pipeline.transport import (
1211
AioHttpTransportResponse,
1312
AsyncHttpTransport,
@@ -92,24 +91,18 @@ def __init__(self, url, body_bytes, headers=None):
9291

9392
class MockTransport(AsyncHttpTransport):
9493
async def send(self, request: HttpRequest, **kwargs) -> AioHttpTransportResponse:
95-
if request.url.endswith("notfound.png"):
96-
raise ResourceNotFoundError(MockAiohttpClientResponse404(request.url, b""))
97-
else:
98-
return AioHttpTransportResponse(
99-
request,
100-
MockAiohttpClientResponse(
101-
request.url,
102-
b"test content",
103-
{
104-
"Content-Type": "application/octet-stream",
105-
"Content-Range": "bytes 0-27/28",
106-
"Content-Length": "28",
107-
},
108-
),
109-
)
110-
111-
async def __aenter__(self):
112-
return self
94+
return AioHttpTransportResponse(
95+
request,
96+
MockAiohttpClientResponse(
97+
request.url,
98+
b"test content",
99+
{
100+
"Content-Type": "application/octet-stream",
101+
"Content-Range": "bytes 0-27/28",
102+
"Content-Length": "28",
103+
},
104+
),
105+
)
113106

114107
async def __aexit__(self, *args):
115108
pass

tests/test_pdfparser.py

Lines changed: 79 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,10 @@
2121
from azure.core.exceptions import HttpResponseError
2222
from PIL import Image, ImageChops
2323

24-
from prepdocslib.mediadescriber import ContentUnderstandingDescriber
24+
from prepdocslib.mediadescriber import (
25+
ContentUnderstandingDescriber,
26+
MultimodalModelDescriber,
27+
)
2528
from prepdocslib.page import ImageOnPage
2629
from prepdocslib.pdfparser import DocumentAnalysisParser, MediaDescriptionStrategy
2730

@@ -386,3 +389,78 @@ async def mock_poller_result():
386389
assert pages[0].page_num == 0
387390
assert pages[0].offset == 0
388391
assert pages[0].text == "Page content"
392+
393+
394+
@pytest.mark.asyncio
395+
async def test_parse_doc_with_openai(monkeypatch):
396+
mock_poller = MagicMock()
397+
398+
async def mock_begin_analyze_document(self, model_id, analyze_request, **kwargs):
399+
return mock_poller
400+
401+
async def mock_poller_result():
402+
content = open(TEST_DATA_DIR / "Simple Figure_content.txt").read()
403+
return AnalyzeResult(
404+
content=content,
405+
pages=[DocumentPage(page_number=1, spans=[DocumentSpan(offset=0, length=148)])],
406+
figures=[
407+
DocumentFigure(
408+
id="1.1",
409+
caption=DocumentCaption(content="Figure 1"),
410+
bounding_regions=[
411+
BoundingRegion(
412+
page_number=1, polygon=[0.4295, 1.3072, 1.7071, 1.3076, 1.7067, 2.6088, 0.4291, 2.6085]
413+
)
414+
],
415+
spans=[DocumentSpan(offset=70, length=22)],
416+
)
417+
],
418+
)
419+
420+
monkeypatch.setattr(DocumentIntelligenceClient, "begin_analyze_document", mock_begin_analyze_document)
421+
monkeypatch.setattr(mock_poller, "result", mock_poller_result)
422+
423+
async def mock_describe_image(self, image_bytes):
424+
return "Pie chart"
425+
426+
monkeypatch.setattr(MultimodalModelDescriber, "describe_image", mock_describe_image)
427+
428+
parser = DocumentAnalysisParser(
429+
endpoint="https://example.com",
430+
credential=MockAzureCredential(),
431+
media_description_strategy=MediaDescriptionStrategy.OPENAI,
432+
openai_client=Mock(),
433+
openai_model="gpt-4o",
434+
openai_deployment="gpt-4o",
435+
)
436+
437+
with open(TEST_DATA_DIR / "Simple Figure.pdf", "rb") as f:
438+
content = io.BytesIO(f.read())
439+
content.name = "Simple Figure.pdf"
440+
441+
pages = [page async for page in parser.parse(content)]
442+
443+
assert len(pages) == 1
444+
assert pages[0].page_num == 0
445+
assert pages[0].offset == 0
446+
assert (
447+
pages[0].text
448+
== "# Simple Figure\n\nThis text is before the figure and NOT part of it.\n\n\n<figure><figcaption>1.1 Figure 1<br>Pie chart</figcaption></figure>\n\n\nThis is text after the figure that's not part of it."
449+
)
450+
451+
452+
@pytest.mark.asyncio
453+
async def test_parse_doc_with_openai_missing_parameters():
454+
parser = DocumentAnalysisParser(
455+
endpoint="https://example.com",
456+
credential=MockAzureCredential(),
457+
media_description_strategy=MediaDescriptionStrategy.OPENAI,
458+
# Intentionally not providing openai_client and openai_model
459+
)
460+
461+
content = io.BytesIO(b"pdf content bytes")
462+
content.name = "test.pdf"
463+
464+
with pytest.raises(ValueError, match="OpenAI client must be provided when using OpenAI media description strategy"):
465+
# Call the first iteration of the generator without using async for
466+
await parser.parse(content).__anext__()

0 commit comments

Comments
 (0)