Skip to content

Commit 0b9370f

Browse files
committed
Update image processing work
1 parent 8e6b61d commit 0b9370f

File tree

4 files changed

+145
-47
lines changed

4 files changed

+145
-47
lines changed

image_processing/.env.example

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
OpenAI__Endpoint=<openAIEndpoint>
2-
OpenAI__CompletionDeployment=<openAIEmbeddingDeploymentId>
2+
OpenAI__MiniCompletionDeployment=<openAIEmbeddingDeploymentId>
33
OpenAI__ApiVersion=<openAIApiVersion>
44
AIService__DocumentIntelligence__Endpoint=<documentIntelligenceEndpoint>
55
StorageAccount__Name=<Name of storage account if using identity based connections>

image_processing/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# AI Search Indexing with Azure Document Intelligence
22

3-
This portion of the repo contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
3+
This portion of the repo contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt-4o-mini) to interpret and understand these.
44

55
The implementation in Python, although it can easily be adapted for C# or another language. The code is designed to run in an Azure Function App inside the tenant.
66

@@ -22,7 +22,7 @@ Instead of using OCR to extract the contents of the document, ADIv4 is used to a
2222

2323
Once the Markdown is obtained, several steps are carried out:
2424

25-
1. **Extraction of images / charts**. The figures identified are extracted from the original document and passed to a multi-modal model (gpt4o in this case) for analysis. We obtain a description and summary of the chart / image to infer the meaning of the figure. This allows us to index and perform RAG analysis the information that is visually obtainable from a chart, without it being explicitly mentioned in the text surrounding. The information is added back into the original chart.
25+
1. **Extraction of images / charts**. The figures identified are extracted from the original document and passed to a multi-modal model (gpt-4o-mini in this case) for analysis. We obtain a description and summary of the chart / image to infer the meaning of the figure. This allows us to index and perform RAG analysis the information that is visually obtainable from a chart, without it being explicitly mentioned in the text surrounding. The information is added back into the original chart.
2626

2727
2. **Chunking**. The obtained content is chunked accordingly depending on the chunking strategy. This function app supports two chunking methods, **page wise** and **semantic chunking**. The page wise chunking is performed natively by Azure Document Intelligence. For a Semantic Chunking, we include a customer chunker that splits the text with the following strategy:
2828

@@ -82,7 +82,7 @@ You can then test the chunking by sending a AI Search JSON format to the `/seman
8282
### Deployment Steps
8383

8484
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. Use this template to update the environment variables in the function app.
85-
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and GPT4o.
85+
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and gpt-4o-mini.
8686
3. [Deploy your function app](https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-technologies?tabs=windows) and test with a HTTP request.
8787

8888
### Code Files

image_processing/src/image_processing/figure_analysis.py

Lines changed: 116 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,35 @@
1010
APIError,
1111
APIStatusError,
1212
BadRequestError,
13+
RateLimitError,
1314
)
14-
from tenacity import retry, stop_after_attempt, wait_exponential
15+
from tenacity import retry, stop_after_attempt, wait_exponential, RetryError
1516
from layout_holders import FigureHolder
17+
from PIL import Image
18+
import io
19+
import base64
1620

1721

1822
class FigureAnalysis:
23+
def get_image_size(self, figure: FigureHolder) -> tuple[int, int]:
24+
"""Get the size of the image from the binary data.
25+
26+
Parameters:
27+
- figure (FigureHolder): The figure object containing the image data.
28+
29+
Returns:
30+
- width (int): The width of the image.
31+
- height (int): The height of the image."""
32+
# Create a BytesIO object from the binary data
33+
image_data = base64.b64decode(figure.data)
34+
image_stream = io.BytesIO(image_data)
35+
36+
# Open the image using PIL
37+
with Image.open(image_stream) as img:
38+
# Get the size of the image
39+
width, height = img.size
40+
return width, height
41+
1942
@retry(
2043
stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10)
2144
)
@@ -31,45 +54,86 @@ async def understand_image_with_gptv(self, figure: FigureHolder) -> dict:
3154
- img_description (str): The generated description for the image.
3255
"""
3356

57+
# Open figure and check if below minimum size
58+
width, height = self.get_image_size(figure)
59+
60+
if width < 75 and height < 75:
61+
logging.info(
62+
"Image is too small to be analysed. Width: %i, Height: %i",
63+
width,
64+
height,
65+
)
66+
figure.description = "Irrelevant Image"
67+
68+
return figure
69+
3470
MAX_TOKENS = 2000
3571
api_version = os.environ["OpenAI__ApiVersion"]
36-
model = os.environ["OpenAI__CompletionDeployment"]
72+
model_name = "gpt-4o-mini"
73+
deployment_id = os.environ["OpenAI__MiniCompletionDeployment"]
74+
azure_endpoint = os.environ["OpenAI__Endpoint"]
3775

3876
token_provider = get_bearer_token_provider(
3977
DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
4078
)
4179

42-
system_prompt = """You are an expert in technical image analysis. Your task is to provided analysis of images. You should FOCUS on what info can be inferred from the image and the meaning of the data inside the image. Draw actionable insights and conclusions from the image. Do not describe the image in a general way or describe the image in a way that is not useful for decision-making.
80+
system_prompt = """You are an expert in technical image description and analysis for search and retrieval. Your task is to describe the key details, themes, and practical applications of the image, focusing on how the image could be used and what it helps the user achieve. Additionally, provide a brief explanation of what can be inferred from the image, such as trends, relationships, or insights.
81+
82+
It is essential to include all visible labels, data points, and annotations in your description. Use natural terms and phrases that users might search for to locate the image.
83+
84+
Charts and Graphs:
85+
- Identify the type of chart and describe the data points, trends, and labels present.
86+
- Explain how the chart can be used (e.g., for analyzing trends, tracking performance, or comparing metrics).
87+
- Describe what can be inferred, such as patterns over time, correlations, or key insights from the data.
88+
89+
Maps:
90+
- Highlight geographical features, landmarks, and any text labels or annotations, such as street names or distances.
91+
- Explain how the map can be used (e.g., for navigation, travel planning, or understanding a region).
92+
- Describe what can be inferred, such as proximity between locations, accessibility of areas, or regional layouts.
4393
44-
If the image is a chart for instance, you should describe the data trends, patterns, and insights that can be drawn from the chart. For example, you could describe the increase or decrease in sales over time, the peak sales period, or the sales performance of a particular product.
94+
Diagrams:
95+
- Describe the components, relationships, and purpose of the diagram.
96+
- Explain how the diagram can be used (e.g., for understanding a process, visualizing a system, or explaining a concept).
97+
- Describe what can be inferred, such as how components interact, dependencies, or the overall system structure.
4598
46-
If the image is a map, you should describe the geographical features, landmarks, and any other relevant information that can be inferred from the map.
99+
Photographs or Logos:
100+
- Return 'Irrelevant Image' if the image is not suitable for actionable purposes like analysis or decision-making e.g. a logo, a personal photo, or a generic landscape.
47101
48-
If the image is a diagram, you should describe the components, relationships, and any other relevant information that can be inferred from the diagram.
49102
50-
Include any data points, labels, and other relevant information that can be inferred from the image.
103+
Guidelines:
104+
- Include all labels, text, and annotations to ensure a complete and accurate description.
105+
- Clearly state both the potential use of the image and what insights or information can be inferred from it.
106+
- Think about what the user might need from the image and describe it accordingly.
107+
- Make sure to consider if the image will be useful for analysis later on. If nothing valuable for analysis, decision making or information retrieval, would be able to be inferred from the image, return 'Irrelevant Image'.
51108
52-
Provide a well-structured, detailed, and actionable analysis of the image. Focus on extracting data and information that can be inferred from the image.
109+
Example:
110+
Input:
111+
- A bar chart showing monthly sales for 2024, with the x-axis labeled "Month" (January to December) and the y-axis labeled "Revenue in USD." The chart shows a steady increase from January to December, with a sharp spike in November.
112+
Output:
113+
- This bar chart shows monthly sales revenue for 2024, with the x-axis labeled 'Month' (January to December) and the y-axis labeled 'Revenue in USD.' It can be used to track sales performance over the year and identify periods of high or low revenue. From the chart, it can be inferred that sales steadily increased throughout the year, with a notable spike in November, possibly due to seasonal promotions or events.
53114
54-
IMPORTANT: If the provided image is a logo or photograph, simply return 'Irrelevant Image'."""
115+
Input:
116+
- A photograph of a mountain landscape with snow-capped peaks, a winding river, and a dense forest in the foreground. The image captures the natural beauty of the region and the diverse ecosystems present.
117+
Output:
118+
- Irrelevant Image"""
55119

56-
user_input = "Perform technical analysis on this image. Provide a well-structured, description."
120+
user_input = "Generate a description for the image provided that can be used for search purposes."
57121

58122
if figure.caption is not None and len(figure.caption) > 0:
59-
user_input += " (note: it has the following caption: {})".format(
60-
figure.caption
61-
)
123+
user_input += f""" (note: it has the following caption: {
124+
figure.caption})"""
62125

63126
try:
64127
async with AsyncAzureOpenAI(
65128
api_key=None,
66129
api_version=api_version,
67130
azure_ad_token_provider=token_provider,
68-
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
131+
azure_endpoint=azure_endpoint,
132+
azure_deployment=deployment_id,
69133
) as client:
70134
# We send both image caption and the image body to GPTv for better understanding
71135
response = await client.chat.completions.create(
72-
model=model,
136+
model=model_name,
73137
messages=[
74138
{
75139
"role": "system",
@@ -93,7 +157,13 @@ async def understand_image_with_gptv(self, figure: FigureHolder) -> dict:
93157
],
94158
max_tokens=MAX_TOKENS,
95159
)
96-
except (OpenAIError, APIError, APIStatusError, BadRequestError) as e:
160+
except (
161+
OpenAIError,
162+
APIError,
163+
APIStatusError,
164+
BadRequestError,
165+
RateLimitError,
166+
) as e:
97167
logging.error(f"Failed to analyse image. Error: {e}")
98168

99169
if "ResponsibleAIPolicyViolation" in e.message:
@@ -108,6 +178,10 @@ async def understand_image_with_gptv(self, figure: FigureHolder) -> dict:
108178

109179
figure.description = response.choices[0].message.content
110180

181+
if len(figure.description) == 0:
182+
logging.info("No description generated for image.")
183+
figure.description = "Irrelevant Image"
184+
111185
logging.info(f"Image Description: {figure.description}")
112186

113187
return figure
@@ -128,20 +202,34 @@ async def analyse(self, record: dict) -> dict:
128202

129203
try:
130204
updated_data = await self.understand_image_with_gptv(figure)
131-
logging.info(f"Updated Data: {updated_data}")
132-
except Exception as e:
205+
logging.info(f"Updated Figure Data: {updated_data}")
206+
except RetryError as e:
133207
logging.error(f"Failed to analyse image. Error: {e}")
134208
logging.error(f"Failed input: {record}")
135-
return {
136-
"recordId": record["recordId"],
137-
"data": {},
138-
"errors": [
139-
{
140-
"message": "Failed to analyse image. Pass a valid source in the request body.",
141-
}
142-
],
143-
"warnings": None,
144-
}
209+
root_cause = e.last_attempt.exception()
210+
211+
if isinstance(root_cause, RateLimitError):
212+
return {
213+
"recordId": record["recordId"],
214+
"data": None,
215+
"errors": [
216+
{
217+
"message": "Failed to analyse image due to rate limit error. Please try again later.",
218+
}
219+
],
220+
"warnings": None,
221+
}
222+
else:
223+
return {
224+
"recordId": record["recordId"],
225+
"data": None,
226+
"errors": [
227+
{
228+
"message": "Failed to analyse image. Check the logs for more details.",
229+
}
230+
],
231+
"warnings": None,
232+
}
145233
else:
146234
return {
147235
"recordId": record["recordId"],

image_processing/src/image_processing/layout_analysis.py

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# Copyright (c) Microsoft Corporation. All rights reserved.
22
# Licensed under the MIT License.
3-
# This code originates from: https://github.com/microsoft/dstoolkit-text2sql-and-imageprocessing
43

54
import logging
65
import os
@@ -32,7 +31,7 @@ class StorageAccountHelper:
3231
@property
3332
def account_url(self) -> str:
3433
"""Get the account URL of the Azure Blob Storage."""
35-
storage_account_name = os.environ.get("StorageAccount__Name")
34+
storage_account_name = os.environ["StorageAccount__Name"]
3635
return f"https://{storage_account_name}.blob.core.windows.net"
3736

3837
async def get_client(self):
@@ -42,7 +41,7 @@ async def get_client(self):
4241
return BlobServiceClient(account_url=self.account_url, credential=credential)
4342

4443
async def add_metadata_to_blob(
45-
self, source: str, container: str, metadata: dict
44+
self, source: str, container: str, metadata: dict, upsert: bool = False
4645
) -> None:
4746
"""Add metadata to the blob.
4847
@@ -51,14 +50,24 @@ async def add_metadata_to_blob(
5150
container (str): The container of the blob.
5251
metadata (dict): The metadata to add to the blob."""
5352

54-
blob = urllib.parse.unquote_plus(source)
53+
logging.info("Adding Metadata")
54+
55+
blob = urllib.parse.unquote(source, encoding="utf-8")
5556

5657
blob_service_client = await self.get_client()
5758
async with blob_service_client:
5859
async with blob_service_client.get_blob_client(
5960
container=container, blob=blob
6061
) as blob_client:
61-
await blob_client.set_blob_metadata(metadata)
62+
blob_properties = await blob_client.get_blob_properties()
63+
64+
if upsert:
65+
updated_metadata = blob_properties.metadata
66+
updated_metadata.update(metadata)
67+
else:
68+
updated_metadata = metadata
69+
70+
await blob_client.set_blob_metadata(updated_metadata)
6271

6372
logging.info("Metadata Added")
6473

@@ -103,7 +112,7 @@ async def download_blob_to_temp_dir(
103112
container (str): The container of the blob.
104113
target_file_name (str): The target file name."""
105114

106-
blob = urllib.parse.unquote_plus(source)
115+
blob = urllib.parse.unquote(source)
107116

108117
blob_service_client = await self.get_client()
109118
async with blob_service_client:
@@ -254,11 +263,9 @@ async def process_figures_from_extracted_content(
254263
)
255264

256265
logging.info(f"Figure Caption: {caption}")
257-
uri = "{}/{}/{}".format(
258-
storage_account_helper.account_url,
259-
self.images_container,
260-
blob,
261-
)
266+
267+
uri = f"""{
268+
storage_account_helper.account_url}/{self.images_container}/{blob}"""
262269

263270
offset = figure.spans[0].offset - text_holder.page_offsets
264271

@@ -414,7 +421,7 @@ async def analyse(self):
414421
logging.error(f"Failed to download the blob: {e}")
415422
return {
416423
"recordId": self.record_id,
417-
"data": {},
424+
"data": None,
418425
"errors": [
419426
{
420427
"message": f"Failed to download the blob. Check the source and try again. {e}",
@@ -430,9 +437,12 @@ async def analyse(self):
430437
logging.error(
431438
"Failed to analyse %s with Azure Document Intelligence.", self.blob
432439
)
440+
await storage_account_helper.add_metadata_to_blob(
441+
self.blob, self.container, {"AzureSearch_Skip": "true"}, upsert=True
442+
)
433443
return {
434444
"recordId": self.record_id,
435-
"data": {},
445+
"data": None,
436446
"errors": [
437447
{
438448
"message": f"Failed to analyze the document with Azure Document Intelligence. Check the logs and try again. {e}",
@@ -484,7 +494,7 @@ async def analyse(self):
484494
logging.error(f"Failed to process the extracted content: {e}")
485495
return {
486496
"recordId": self.record_id,
487-
"data": {},
497+
"data": None,
488498
"errors": [
489499
{
490500
"message": f"Failed to process the extracted content. Check the logs and try again. {e}",
@@ -536,7 +546,7 @@ async def process_layout_analysis(
536546
except KeyError:
537547
return {
538548
"recordId": record["recordId"],
539-
"data": {},
549+
"data": None,
540550
"errors": [
541551
{
542552
"message": "Failed to extract data with ADI. Pass a valid source in the request body.",

0 commit comments

Comments
 (0)