You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: image_processing/README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# AI Search Indexing with Azure Document Intelligence
2
2
3
-
This portion of the repo contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt4o) to interpret and understand these.
3
+
This portion of the repo contains code for linking Azure Document Intelligence with AI Search to process complex documents with charts and images, and uses multi-modal models (gpt-4o-mini) to interpret and understand these.
4
4
5
5
The implementation in Python, although it can easily be adapted for C# or another language. The code is designed to run in an Azure Function App inside the tenant.
6
6
@@ -22,7 +22,7 @@ Instead of using OCR to extract the contents of the document, ADIv4 is used to a
22
22
23
23
Once the Markdown is obtained, several steps are carried out:
24
24
25
-
1.**Extraction of images / charts**. The figures identified are extracted from the original document and passed to a multi-modal model (gpt4o in this case) for analysis. We obtain a description and summary of the chart / image to infer the meaning of the figure. This allows us to index and perform RAG analysis the information that is visually obtainable from a chart, without it being explicitly mentioned in the text surrounding. The information is added back into the original chart.
25
+
1.**Extraction of images / charts**. The figures identified are extracted from the original document and passed to a multi-modal model (gpt-4o-mini in this case) for analysis. We obtain a description and summary of the chart / image to infer the meaning of the figure. This allows us to index and perform RAG analysis the information that is visually obtainable from a chart, without it being explicitly mentioned in the text surrounding. The information is added back into the original chart.
26
26
27
27
2.**Chunking**. The obtained content is chunked accordingly depending on the chunking strategy. This function app supports two chunking methods, **page wise** and **semantic chunking**. The page wise chunking is performed natively by Azure Document Intelligence. For a Semantic Chunking, we include a customer chunker that splits the text with the following strategy:
28
28
@@ -82,7 +82,7 @@ You can then test the chunking by sending a AI Search JSON format to the `/seman
82
82
### Deployment Steps
83
83
84
84
1. Update `.env` file with the associated values. Not all values are required dependent on whether you are using System / User Assigned Identities or a Key based authentication. Use this template to update the environment variables in the function app.
85
-
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and GPT4o.
85
+
2. Make sure the infra and required identities are setup. This setup requires Azure Document Intelligence and gpt-4o-mini.
86
86
3.[Deploy your function app](https://learn.microsoft.com/en-us/azure/azure-functions/functions-deployment-technologies?tabs=windows) and test with a HTTP request.
system_prompt="""You are an expert in technical image analysis. Your task is to provided analysis of images. You should FOCUS on what info can be inferred from the image and the meaning of the data inside the image. Draw actionable insights and conclusions from the image. Do not describe the image in a general way or describe the image in a way that is not useful for decision-making.
80
+
system_prompt="""You are an expert in technical image description and analysis for search and retrieval. Your task is to describe the key details, themes, and practical applications of the image, focusing on how the image could be used and what it helps the user achieve. Additionally, provide a brief explanation of what can be inferred from the image, such as trends, relationships, or insights.
81
+
82
+
It is essential to include all visible labels, data points, and annotations in your description. Use natural terms and phrases that users might search for to locate the image.
83
+
84
+
Charts and Graphs:
85
+
- Identify the type of chart and describe the data points, trends, and labels present.
86
+
- Explain how the chart can be used (e.g., for analyzing trends, tracking performance, or comparing metrics).
87
+
- Describe what can be inferred, such as patterns over time, correlations, or key insights from the data.
88
+
89
+
Maps:
90
+
- Highlight geographical features, landmarks, and any text labels or annotations, such as street names or distances.
91
+
- Explain how the map can be used (e.g., for navigation, travel planning, or understanding a region).
92
+
- Describe what can be inferred, such as proximity between locations, accessibility of areas, or regional layouts.
43
93
44
-
If the image is a chart for instance, you should describe the data trends, patterns, and insights that can be drawn from the chart. For example, you could describe the increase or decrease in sales over time, the peak sales period, or the sales performance of a particular product.
94
+
Diagrams:
95
+
- Describe the components, relationships, and purpose of the diagram.
96
+
- Explain how the diagram can be used (e.g., for understanding a process, visualizing a system, or explaining a concept).
97
+
- Describe what can be inferred, such as how components interact, dependencies, or the overall system structure.
45
98
46
-
If the image is a map, you should describe the geographical features, landmarks, and any other relevant information that can be inferred from the map.
99
+
Photographs or Logos:
100
+
- Return 'Irrelevant Image' if the image is not suitable for actionable purposes like analysis or decision-making e.g. a logo, a personal photo, or a generic landscape.
47
101
48
-
If the image is a diagram, you should describe the components, relationships, and any other relevant information that can be inferred from the diagram.
49
102
50
-
Include any data points, labels, and other relevant information that can be inferred from the image.
103
+
Guidelines:
104
+
- Include all labels, text, and annotations to ensure a complete and accurate description.
105
+
- Clearly state both the potential use of the image and what insights or information can be inferred from it.
106
+
- Think about what the user might need from the image and describe it accordingly.
107
+
- Make sure to consider if the image will be useful for analysis later on. If nothing valuable for analysis, decision making or information retrieval, would be able to be inferred from the image, return 'Irrelevant Image'.
51
108
52
-
Provide a well-structured, detailed, and actionable analysis of the image. Focus on extracting data and information that can be inferred from the image.
109
+
Example:
110
+
Input:
111
+
- A bar chart showing monthly sales for 2024, with the x-axis labeled "Month" (January to December) and the y-axis labeled "Revenue in USD." The chart shows a steady increase from January to December, with a sharp spike in November.
112
+
Output:
113
+
- This bar chart shows monthly sales revenue for 2024, with the x-axis labeled 'Month' (January to December) and the y-axis labeled 'Revenue in USD.' It can be used to track sales performance over the year and identify periods of high or low revenue. From the chart, it can be inferred that sales steadily increased throughout the year, with a notable spike in November, possibly due to seasonal promotions or events.
53
114
54
-
IMPORTANT: If the provided image is a logo or photograph, simply return 'Irrelevant Image'."""
115
+
Input:
116
+
- A photograph of a mountain landscape with snow-capped peaks, a winding river, and a dense forest in the foreground. The image captures the natural beauty of the region and the diverse ecosystems present.
117
+
Output:
118
+
- Irrelevant Image"""
55
119
56
-
user_input="Perform technical analysis on this image. Provide a well-structured, description."
120
+
user_input="Generate a description for the image provided that can be used for search purposes."
0 commit comments