Skip to content

Commit 6a3edd0

Browse files
authored
Merge pull request #4 from MicrosoftCloudEssentials-LearningHub/IaC-aivision-booster
adding ai vision to the IaC - booster for filled capab
2 parents e49a562 + 635283a commit 6a3edd0

File tree

10 files changed

+174
-36
lines changed

10 files changed

+174
-36
lines changed

.github/workflows/validate_and_fix_markdown.yml

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ jobs:
1818
uses: actions/checkout@v4
1919
with:
2020
fetch-depth: 0
21+
ref: ${{ github.head_ref || github.ref_name }}
2122

2223
- name: Set up Node.js
2324
uses: actions/setup-node@v3
@@ -35,11 +36,23 @@ jobs:
3536
git config --global user.email "github-actions[bot]@users.noreply.github.com"
3637
git config --global user.name "github-actions[bot]"
3738
38-
- name: Commit and rebase changes
39+
- name: Commit and merge changes
3940
env:
4041
PR_BRANCH: ${{ github.head_ref || github.ref_name }}
42+
GIT_AUTHOR_NAME: github-actions[bot]
43+
GIT_AUTHOR_EMAIL: github-actions[bot]@users.noreply.github.com
44+
GIT_COMMITTER_NAME: github-actions[bot]
45+
GIT_COMMITTER_EMAIL: github-actions[bot]@users.noreply.github.com
4146
run: |
47+
# Ensure we're on the correct branch
48+
git switch -c "$PR_BRANCH" || git switch "$PR_BRANCH"
49+
50+
# Stage and commit changes if any
4251
git add -A
43-
git commit -m "Fix Markdown syntax issues" || echo "No changes to commit"
44-
git pull --rebase origin "$PR_BRANCH" || echo "No rebase needed"
45-
git push origin HEAD:"$PR_BRANCH"
52+
git diff --staged --quiet || git commit -m "Fix Markdown syntax issues"
53+
54+
# Pull and merge existing changes
55+
git pull origin "$PR_BRANCH" --no-rebase
56+
57+
# Push all changes
58+
git push origin "$PR_BRANCH"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
*.tfstate
66
*.tfstate.*
77
.terraform.lock.hcl
8+
terraform.tfstate.backup
89

910
# Crash log files
1011
crash.log

README.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Demo: PDF Layout Extraction with Doc Intelligence (full-code approach)
1+
# Demo: PDF Layout Extraction with Doc Intelligence <br/> Supporting Multiple Document Versions with Visual Selection Cues (full-code approach)
22

33
`Azure Storage + Document Intelligence + Function App + Cosmos DB`
44

@@ -8,9 +8,16 @@ Costa Rica
88
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
99
[brown9804](https://github.com/brown9804)
1010

11-
Last updated: 2025-07-16
11+
Last updated: 2025-07-21
1212

13-
----------
13+
-----------------------------
14+
15+
> This solution is designed to be flexible and robust, supporting multiple versions of PDF documents with varying layouts—including those that use visual selection cues such as gray fills, hand-drawn Xs, checkmarks, or circles. By building on the [PDFs-Layouts-Processing-Fapp-DocIntelligence](https://github.com/MicrosoftCloudEssentials-LearningHub/PDFs-Layouts-Processing-Fapp-DocIntelligence) repository, we ensure that:
16+
17+
- Table structure and text are extracted using Azure Document Intelligence (Layout model).
18+
- Visual selection cues are detected using Azure AI Vision or image preprocessing.
19+
- Visual indicators are mapped to structured data, returning only the selected values in a clean JSON format.
20+
- The logic is abstracted to support multiple layout variations, so the system adapts easily to new document formats and selection styles.
1421

1522
> [!IMPORTANT]
1623
> This example is based on a `public network site and is intended for demonstration purposes only`. It showcases how several Azure resources can work together to achieve the desired result. Consider the section below about [Important Considerations for Production Environment](#important-considerations-for-production-environment). Please note that `these demos are intended as a guide and are based on my personal experiences. For official guidance, support, or more detailed information, please refer to Microsoft's official documentation or contact Microsoft directly`: [Microsoft Sales and Support](https://support.microsoft.com/contactus?ContactUsExperienceEntryPointAssetId=S.HP.SMC-HOME)
@@ -440,7 +447,7 @@ Last updated: 2025-07-16
440447

441448
<!-- START BADGE -->
442449
<div align="center">
443-
<img src="https://img.shields.io/badge/Total%20views-55-limegreen" alt="Total views">
444-
<p>Refresh Date: 2025-07-16</p>
450+
<img src="https://img.shields.io/badge/Total%20views-164-limegreen" alt="Total views">
451+
<p>Refresh Date: 2025-07-21</p>
445452
</div>
446453
<!-- END BADGE -->

docs/automatedPDFLayoutprocessingFunctionAppDocIntellig.drawio renamed to docs/automated-PDFLayoutprocessing-FunctionApp-DocIntellig-AI-Vision.drawio

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
1-
<mxfile host="Electron" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/27.0.9 Chrome/134.0.6998.205 Electron/35.4.0 Safari/537.36" version="27.0.9">
1+
<mxfile host="Electron" agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/28.0.4 Chrome/138.0.7204.97 Electron/37.2.1 Safari/537.36" version="28.0.4">
22
<diagram name="Page-1" id="_ZzkEdzZPlF0T37kGrCl">
3-
<mxGraphModel dx="1281" dy="1822" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
3+
<mxGraphModel dx="732" dy="1532" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
44
<root>
55
<mxCell id="0" />
66
<mxCell id="1" parent="0" />
77
<mxCell id="SBEox3NDaokPfLYJbtWu-15" value="" style="rounded=0;whiteSpace=wrap;html=1;" parent="1" vertex="1">
8-
<mxGeometry x="20" width="920" height="620" as="geometry" />
8+
<mxGeometry x="20" y="-90" width="920" height="710" as="geometry" />
99
</mxCell>
1010
<mxCell id="SBEox3NDaokPfLYJbtWu-2" value="Storage Account" style="image;aspect=fixed;html=1;points=[];align=center;fontSize=12;image=img/lib/azure2/storage/Storage_Accounts.svg;" parent="1" vertex="1">
1111
<mxGeometry x="240" y="136" width="75" height="60" as="geometry" />
@@ -27,10 +27,10 @@
2727
<mxCell id="SBEox3NDaokPfLYJbtWu-3" value="Employee" style="shape=umlActor;verticalLabelPosition=bottom;verticalAlign=top;html=1;outlineConnect=0;" parent="SBEox3NDaokPfLYJbtWu-10" vertex="1">
2828
<mxGeometry y="30" width="30" height="60" as="geometry" />
2929
</mxCell>
30-
<mxCell id="_wiV1sLz3M6k8l1JJ68s-4" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.017;entryY=0.605;entryDx=0;entryDy=0;entryPerimeter=0;" parent="1" source="SBEox3NDaokPfLYJbtWu-12" target="_wiV1sLz3M6k8l1JJ68s-1" edge="1">
30+
<mxCell id="_wiV1sLz3M6k8l1JJ68s-4" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.126;entryY=0.408;entryDx=0;entryDy=0;entryPerimeter=0;" parent="1" source="SBEox3NDaokPfLYJbtWu-12" target="qB0o09IW0mbKmVrXtbLM-1" edge="1">
3131
<mxGeometry relative="1" as="geometry">
3232
<Array as="points">
33-
<mxPoint x="540" y="131" />
33+
<mxPoint x="510" y="18" />
3434
</Array>
3535
</mxGeometry>
3636
</mxCell>
@@ -43,10 +43,10 @@
4343
<mxGeometry x="510" y="300" width="68" height="60" as="geometry" />
4444
</mxCell>
4545
<mxCell id="SBEox3NDaokPfLYJbtWu-13" value="Resource Group" style="image;sketch=0;aspect=fixed;html=1;points=[];align=center;fontSize=12;image=img/lib/mscae/ResourceGroup.svg;" parent="1" vertex="1">
46-
<mxGeometry x="20" width="50" height="40" as="geometry" />
46+
<mxGeometry x="20" y="-90" width="50" height="40" as="geometry" />
4747
</mxCell>
4848
<mxCell id="SBEox3NDaokPfLYJbtWu-14" value="Subscription" style="image;aspect=fixed;html=1;points=[];align=center;fontSize=12;image=img/lib/azure2/general/Subscriptions.svg;" parent="1" vertex="1">
49-
<mxGeometry x="890" y="-20" width="44" height="71" as="geometry" />
49+
<mxGeometry x="890" y="-90" width="44" height="71" as="geometry" />
5050
</mxCell>
5151
<mxCell id="SBEox3NDaokPfLYJbtWu-16" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1.004;entryY=0.433;entryDx=0;entryDy=0;entryPerimeter=0;" parent="1" source="SBEox3NDaokPfLYJbtWu-12" target="SBEox3NDaokPfLYJbtWu-2" edge="1">
5252
<mxGeometry relative="1" as="geometry" />
@@ -89,6 +89,16 @@
8989
<mxPoint as="offset" />
9090
</mxGeometry>
9191
</mxCell>
92+
<mxCell id="qB0o09IW0mbKmVrXtbLM-1" value="Azure &lt;br&gt;AI Vision&amp;nbsp;" style="image;aspect=fixed;html=1;points=[];align=center;fontSize=12;image=img/lib/azure2/ai_machine_learning/Computer_Vision.svg;" vertex="1" parent="1">
93+
<mxGeometry x="550" y="-10" width="68" height="68" as="geometry" />
94+
</mxCell>
95+
<mxCell id="qB0o09IW0mbKmVrXtbLM-3" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1;entryY=0.609;entryDx=0;entryDy=0;entryPerimeter=0;edgeStyle=orthogonalEdgeStyle;elbow=vertical;shape=link;" edge="1" parent="1" source="_wiV1sLz3M6k8l1JJ68s-1" target="qB0o09IW0mbKmVrXtbLM-1">
96+
<mxGeometry relative="1" as="geometry">
97+
<Array as="points">
98+
<mxPoint x="710" y="31" />
99+
</Array>
100+
</mxGeometry>
101+
</mxCell>
92102
</root>
93103
</mxGraphModel>
94104
</diagram>

metrics.json

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,12 @@
2626
},
2727
{
2828
"date": "2025-07-14",
29-
"count": 4,
29+
"count": 130,
30+
"uniques": 2
31+
},
32+
{
33+
"date": "2025-07-15",
34+
"count": 2,
3035
"uniques": 1
3136
}
3237
]

src/function_app.py

Lines changed: 60 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,12 @@
88
import uuid
99
import json
1010

11+
# For image conversion and vision API
12+
from typing import List
13+
from io import BytesIO
14+
import requests # For REST API to Vision
15+
from pdf2image import convert_from_bytes # For PDF to image conversion
16+
1117
app = func.FunctionApp(http_auth_level=func.AuthLevel.FUNCTION)
1218

1319
## DEFINITIONS
@@ -35,13 +41,14 @@ def analyze_pdf(form_recognizer_client, pdf_bytes):
3541
logging.info(f"Document has {len(result.pages)} page(s), {len(result.tables)} table(s), and {len(result.styles)} style(s).")
3642
return result
3743

38-
def extract_layout_data(result):
44+
def extract_layout_data(result, visual_cues: List[dict] = None):
3945
logging.info("Extracting layout data from analysis result.")
4046

4147
layout_data = {
4248
"id": str(uuid.uuid4()),
4349
"pages": []
4450
}
51+
visual_cues = visual_cues or [] # List of dicts with visual cue info per cell
4552

4653
# Log styles
4754
for idx, style in enumerate(result.styles):
@@ -88,12 +95,16 @@ def extract_layout_data(result):
8895

8996
for cell in table.cells:
9097
content = cell.content.strip()
91-
table_data["cells"].append({
98+
# Find matching visual cue for this cell (if any)
99+
cue = next((vc for vc in visual_cues if vc.get("page_number") == page.page_number and vc.get("row_index") == cell.row_index and vc.get("column_index") == cell.column_index), None)
100+
cell_info = {
92101
"row_index": cell.row_index,
93102
"column_index": cell.column_index,
94-
"content": content
95-
})
96-
logging.info(f"Cell[{cell.row_index}][{cell.column_index}]: '{content}'")
103+
"content": content,
104+
"visual_cue": cue["cue_type"] if cue else None
105+
}
106+
table_data["cells"].append(cell_info)
107+
logging.info(f"Cell[{cell.row_index}][{cell.column_index}]: '{content}', visual_cue: {cell_info['visual_cue']}")
97108

98109
page_data["tables"].append(table_data)
99110

@@ -156,6 +167,31 @@ def save_layout_data_to_cosmos(layout_data):
156167
## MAIN
157168
@app.blob_trigger(arg_name="myblob", path="pdfinvoices/{name}",
158169
connection="invoicecontosostorage_STORAGE")
170+
def call_vision_api(image_bytes, subscription_key, endpoint):
171+
vision_url = endpoint + "/vision/v3.2/analyze"
172+
headers = {
173+
'Ocp-Apim-Subscription-Key': subscription_key,
174+
'Content-Type': 'application/octet-stream'
175+
}
176+
params = {
177+
'visualFeatures': 'Objects,Color', # Add more features if needed
178+
}
179+
response = requests.post(vision_url, headers=headers, params=params, data=image_bytes)
180+
response.raise_for_status()
181+
return response.json()
182+
183+
def extract_visual_cues_from_vision(vision_result, page_number):
184+
# Example: Detect gray fills, checkmarks, hand-drawn marks
185+
cues = []
186+
# This is a placeholder. You need to parse vision_result for your cues.
187+
# For example, if vision_result['objects'] contains a 'checkmark' or color info for gray fill
188+
# cues.append({"page_number": page_number, "row_index": ..., "column_index": ..., "cue_type": "gray_fill"})
189+
return cues
190+
191+
def convert_pdf_to_images(pdf_bytes):
192+
images = convert_from_bytes(pdf_bytes)
193+
return images
194+
159195
def BlobTriggerContosoPDFLayoutsDocIntelligence(myblob: func.InputStream):
160196
logging.info(f"Python blob trigger function processed blob\n"
161197
f"Name: {myblob.name}\n"
@@ -176,9 +212,26 @@ def BlobTriggerContosoPDFLayoutsDocIntelligence(myblob: func.InputStream):
176212
logging.error(f"Error analyzing PDF: {e}")
177213
return
178214

215+
# --- Step: Convert PDF to image and call Azure AI Vision ---
216+
visual_cues = []
217+
try:
218+
images = convert_pdf_to_images(pdf_bytes)
219+
vision_key = os.getenv("VISION_API_KEY")
220+
vision_endpoint = os.getenv("VISION_API_ENDPOINT")
221+
for page_num, image in enumerate(images, start=1):
222+
img_bytes_io = BytesIO()
223+
image.save(img_bytes_io, format='JPEG')
224+
img_bytes = img_bytes_io.getvalue()
225+
vision_result = call_vision_api(img_bytes, vision_key, vision_endpoint)
226+
cues = extract_visual_cues_from_vision(vision_result, page_num)
227+
visual_cues.extend(cues)
228+
logging.info(f"Visual cues extracted: {visual_cues}")
229+
except Exception as e:
230+
logging.error(f"Error processing visual cues with AI Vision: {e}")
231+
179232
try:
180-
layout_data = extract_layout_data(result)
181-
logging.info("Successfully extracted layout data.")
233+
layout_data = extract_layout_data(result, visual_cues)
234+
logging.info("Successfully extracted and merged layout data.")
182235
except Exception as e:
183236
logging.error(f"Error extracting layout data: {e}")
184237
return

terraform-infrastructure/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ Costa Rica
55
[![GitHub](https://img.shields.io/badge/--181717?logo=github&logoColor=ffffff)](https://github.com/)
66
[brown9804](https://github.com/brown9804)
77

8-
Last updated: 2025-07-16
8+
Last updated: 2025-07-21
99

1010
----------
1111

@@ -109,7 +109,7 @@ graph TD;
109109

110110
<!-- START BADGE -->
111111
<div align="center">
112-
<img src="https://img.shields.io/badge/Total%20views-55-limegreen" alt="Total views">
113-
<p>Refresh Date: 2025-07-16</p>
112+
<img src="https://img.shields.io/badge/Total%20views-164-limegreen" alt="Total views">
113+
<p>Refresh Date: 2025-07-21</p>
114114
</div>
115115
<!-- END BADGE -->

terraform-infrastructure/main.tf

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@ resource "azurerm_role_assignment" "contributor" {
289289
]
290290
}
291291

292+
292293
# Azure Form Recognizer (Document Intelligence)
293294
resource "azurerm_cognitive_account" "form_recognizer" {
294295
name = var.form_recognizer_name
@@ -299,12 +300,27 @@ resource "azurerm_cognitive_account" "form_recognizer" {
299300

300301
depends_on = [azurerm_resource_group.rg]
301302

302-
# Output the Form Recognizer name
303303
provisioner "local-exec" {
304304
command = "echo Form Recognizer: ${self.name}"
305305
}
306306
}
307307

308+
# Azure AI Vision (Cognitive Services)
309+
resource "azurerm_cognitive_account" "ai_vision" {
310+
name = var.ai_vision_name
311+
location = azurerm_resource_group.rg.location
312+
resource_group_name = azurerm_resource_group.rg.name
313+
kind = "CognitiveServices"
314+
sku_name = var.ai_vision_sku
315+
tags = var.ai_vision_tags
316+
317+
depends_on = [azurerm_resource_group.rg]
318+
319+
provisioner "local-exec" {
320+
command = "echo AI Vision: ${self.name}"
321+
}
322+
}
323+
308324
# We need to assign custom or built-in Cosmos DB SQL roles
309325
# (like Cosmos DB Built-in Data Reader, etc.) at the data plane level,
310326
# which is not currently supported directly in Terraform as of now.
@@ -373,6 +389,10 @@ resource "azurerm_linux_function_app" "function_app" {
373389

374390
"APPINSIGHTS_INSTRUMENTATIONKEY" = azurerm_application_insights.appinsights.instrumentation_key
375391
"APPLICATIONINSIGHTS_CONNECTION_STRING" = azurerm_application_insights.appinsights.connection_string
392+
393+
# Azure AI Vision settings
394+
"VISION_API_ENDPOINT" = azurerm_cognitive_account.ai_vision.endpoint
395+
"VISION_API_KEY" = azurerm_cognitive_account.ai_vision.primary_access_key
376396
}
377397

378398
depends_on = [
Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,30 @@
11
# Sample values
2-
subscription_id = "" # "your-subscription_id"
3-
resource_group_name = "RG-PDFLayout-Processing-DocIntelligence" # "your-resource-group-name"
4-
location = "West US" # "your-location"
2+
subscription_id = "407f4106-0fd3-42e0-9348-3686dd1e7347" # "your-subscription_id"
3+
resource_group_name = "RG-PDFLayout-Processing-DocIntelligence" # "your-resource-group-name"
4+
location = "West US" # "your-location"
55
# Storage Account
6-
storage_account_name = "storageaccbrownpdfix2" # "your-storage-account-name"
6+
storage_account_name = "storageaccbrownpdfix2" # "your-storage-account-name"
77
storage_account_name_runtime = "runtimestorebrownix2" # "your-runtime-storage-account-name"
88
# Function App
9-
function_app_name = "fapdfbrownix2" # "your-function-app-name"
9+
function_app_name = "fapdfbrownix2" # "your-function-app-name"
1010
# App Service Plan
1111
app_service_plan_name = "asppdfbrownix2" # "your-app-service-plan-name"
1212
# Application Insights
13-
app_insights_name = "apppdfbrownix2" # "your-app-insights-name"
13+
app_insights_name = "apppdfbrownix2" # "your-app-insights-name"
1414
# Log Analytics Workspace
1515
log_analytics_workspace_name = "logwspdfbrownix2" # "your-log-analytics-workspace-name"
1616
# Key Vault
17-
key_vault_name = "kvpdfrbrownix2" # "your-key-vault-name"
17+
key_vault_name = "kvpdfrbrownrix2" # "your-key-vault-name"
1818
# CosmosDB
1919
cosmosdb_account_name = "cosmospdfbrownix2" # "your-cosmosdb-account-name"
2020
# Form Recognizer -> Document Intelligence
21-
form_recognizer_name = "docintelligt01ix2" # "your-document-intelligence-name"
21+
form_recognizer_name = "docintelligt01ix2" # "your-document-intelligence-name"
22+
23+
# AI Vision Service
24+
ai_vision_name = "aivisionpdfrbrownix2" # "your-ai-vision-name"
25+
ai_vision_sku = "S0"
26+
ai_vision_tags = {
27+
Environment = "Development"
28+
Project = "PDF Processing"
29+
Service = "AI Vision"
30+
}

terraform-infrastructure/variables.tf

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,26 @@ variable "key_vault_name" {
4848
description = "The name of the Key Vault"
4949
type = string
5050
}
51+
52+
variable "ai_vision_name" {
53+
description = "The name of the AI Vision Cognitive Services account"
54+
type = string
55+
}
56+
57+
variable "ai_vision_sku" {
58+
description = "The SKU of the AI Vision Cognitive Services account"
59+
type = string
60+
default = "S0"
61+
}
62+
63+
variable "ai_vision_tags" {
64+
description = "Tags to be applied to the AI Vision resource"
65+
type = map(string)
66+
default = {
67+
Environment = "Development"
68+
Service = "AI Vision"
69+
}
70+
}
5171
variable "cosmosdb_account_name" {
5272
description = "The name of the CosmosDB account."
5373
type = string

0 commit comments

Comments
 (0)