-
Notifications
You must be signed in to change notification settings - Fork 5k
Add media description feature using Azure Content Understanding #2195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 6 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
c19a9f3
First pass
pamelafox 7b52dac
CU kinda working
pamelafox 65e5616
CU integration
pamelafox 7130a24
Better splitting
pamelafox 9ba6e3a
Add Bicep
pamelafox c621a43
Rm unneeded figures
pamelafox 0fef108
Remove en-us from URLs
pamelafox 93e774d
Fix URLs
pamelafox 3b104fb
Remote figures output JSON
pamelafox 9973a77
Update matrix comments
pamelafox 109d7d4
Merge branch 'main' into contentunderstanding
pamelafox 0681755
Make mypy happy
pamelafox 400d313
Merge branch 'contentunderstanding' of https://github.com/pamelafox/a…
pamelafox ec66c52
Add same errors to file strategy
pamelafox 5a3040a
Add pymupdf to skip modules for mypy
pamelafox 2a6e604
Output the endpoint from Bicep
pamelafox b8c4d94
100 percent coverage for mediadescriber.py
pamelafox 8ec9514
Tests added for PDFParser
pamelafox 6d4e756
Fix that tuple type
pamelafox 75b159d
Add pricing link
pamelafox c88b5d5
Fix content read issue
pamelafox File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
import logging | ||
from typing import Union | ||
|
||
import aiohttp | ||
from azure.core.credentials_async import AsyncTokenCredential | ||
from azure.identity.aio import get_bearer_token_provider | ||
from rich.progress import Progress | ||
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed | ||
|
||
logger = logging.getLogger("scripts") | ||
|
||
CU_API_VERSION = "2024-12-01-preview" | ||
|
||
PATH_ANALYZER_MANAGEMENT = "/analyzers/{analyzerId}" | ||
PATH_ANALYZER_MANAGEMENT_OPERATION = "/analyzers/{analyzerId}/operations/{operationId}" | ||
|
||
# Define Analyzer inference paths | ||
PATH_ANALYZER_INFERENCE = "/analyzers/{analyzerId}:analyze" | ||
PATH_ANALYZER_INFERENCE_GET_IMAGE = "/analyzers/{analyzerId}/results/{operationId}/images/{imageId}" | ||
|
||
analyzer_name = "image_analyzer" | ||
image_schema = { | ||
"analyzerId": analyzer_name, | ||
"name": "Image understanding", | ||
"description": "Extract detailed structured information from images extracted from documents.", | ||
"baseAnalyzerId": "prebuilt-image", | ||
"scenario": "image", | ||
"config": {"returnDetails": False}, | ||
"fieldSchema": { | ||
"name": "ImageInformation", | ||
"descriptions": "Description of image.", | ||
"fields": { | ||
"Description": { | ||
"type": "string", | ||
"description": "Description of the image. If the image has a title, start with the title. Include a 2-sentence summary. If the image is a chart, diagram, or table, include the underlying data in an HTML table tag, with accurate numbers. If the image is a chart, describe any axis or legends. The only allowed HTML tags are the table/thead/tr/td/tbody tags.", | ||
}, | ||
}, | ||
}, | ||
} | ||
|
||
|
||
class ContentUnderstandingManager: | ||
|
||
def __init__(self, endpoint: str, credential: Union[AsyncTokenCredential, str]): | ||
self.endpoint = endpoint | ||
self.credential = credential | ||
|
||
async def poll_api(self, session, poll_url, headers): | ||
|
||
@retry(stop=stop_after_attempt(60), wait=wait_fixed(2), retry=retry_if_exception_type(ValueError)) | ||
async def poll(): | ||
async with session.get(poll_url, headers=headers) as response: | ||
response.raise_for_status() | ||
response_json = await response.json() | ||
if response_json["status"] == "Failed": | ||
raise Exception("Failed") | ||
if response_json["status"] == "Running": | ||
raise ValueError("Running") | ||
return response_json | ||
|
||
return await poll() | ||
|
||
async def create_analyzer(self): | ||
logger.info("Creating analyzer '%s'...", image_schema["analyzerId"]) | ||
|
||
token_provider = get_bearer_token_provider(self.credential, "https://cognitiveservices.azure.com/.default") | ||
token = await token_provider() | ||
headers = {"Authorization": f"Bearer {token}", "Content-Type": "application/json"} | ||
params = {"api-version": CU_API_VERSION} | ||
analyzer_id = image_schema["analyzerId"] | ||
cu_endpoint = f"{self.endpoint}/contentunderstanding/analyzers/{analyzer_id}" | ||
async with aiohttp.ClientSession() as session: | ||
async with session.put(url=cu_endpoint, params=params, headers=headers, json=image_schema) as response: | ||
if response.status == 409: | ||
logger.info("Analyzer '%s' already exists.", analyzer_id) | ||
return | ||
elif response.status != 201: | ||
data = await response.text() | ||
logger.error("Error creating analyzer: %s", data) | ||
response.raise_for_status() | ||
else: | ||
poll_url = response.headers.get("Operation-Location") | ||
|
||
with Progress() as progress: | ||
progress.add_task("Creating analyzer...", total=None, start=False) | ||
await self.poll_api(session, poll_url, headers) | ||
|
||
async def describe_image(self, image_bytes) -> str: | ||
logger.info("Sending image to Azure Content Understanding service...") | ||
async with aiohttp.ClientSession() as session: | ||
token = await self.credential.get_token("https://cognitiveservices.azure.com/.default") | ||
headers = {"Authorization": "Bearer " + token.token} | ||
params = {"api-version": CU_API_VERSION} | ||
|
||
async with session.post( | ||
url=f"{self.endpoint}/contentunderstanding/analyzers/{analyzer_name}:analyze", | ||
params=params, | ||
headers=headers, | ||
data=image_bytes, | ||
) as response: | ||
response.raise_for_status() | ||
poll_url = response.headers["Operation-Location"] | ||
|
||
with Progress() as progress: | ||
progress.add_task("Processing...", total=None, start=False) | ||
results = await self.poll_api(session, poll_url, headers) | ||
|
||
fields = results["result"]["contents"][0]["fields"] | ||
return fields["DescriptionHTML"]["valueString"] |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.