Skip to content

Commit 9ba6e3a

Browse files
committed
Add Bicep
1 parent 7130a24 commit 9ba6e3a

File tree

20 files changed

+333
-58
lines changed

20 files changed

+333
-58
lines changed

.azdo/pipelines/azure-dev.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ steps:
120120
DEPLOYMENT_TARGET: $(DEPLOYMENT_TARGET)
121121
AZURE_CONTAINER_APPS_WORKLOAD_PROFILE: $(AZURE_CONTAINER_APPS_WORKLOAD_PROFILE)
122122
USE_CHAT_HISTORY_BROWSER: $(USE_CHAT_HISTORY_BROWSER)
123+
USE_MEDIA_DESCRIBER_AZURE_CU: $(USE_MEDIA_DESCRIBER_AZURE_CU)
123124
- task: AzureCLI@2
124125
displayName: Deploy Application
125126
inputs:

.github/workflows/azure-dev.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ jobs:
103103
DEPLOYMENT_TARGET: ${{ vars.DEPLOYMENT_TARGET }}
104104
AZURE_CONTAINER_APPS_WORKLOAD_PROFILE: ${{ vars.AZURE_CONTAINER_APPS_WORKLOAD_PROFILE }}
105105
USE_CHAT_HISTORY_BROWSER: ${{ vars.USE_CHAT_HISTORY_BROWSER }}
106+
USE_MEDIA_DESCRIBER_AZURE_CU: ${{ vars.USE_MEDIA_DESCRIBER_AZURE_CU }}
106107
steps:
107108
- name: Checkout
108109
uses: actions/checkout@v4

CONTRIBUTING.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ contact [[email protected]](mailto:[email protected]) with any additio
2222
- [Running unit tests](#running-unit-tests)
2323
- [Running E2E tests](#running-e2e-tests)
2424
- [Code Style](#code-style)
25-
- [Adding new azd environment variables](#add-new-azd-environment-variables)
25+
- [Adding new azd environment variables](#adding-new-azd-environment-variables)
2626

2727
## Code of Conduct
2828

@@ -166,6 +166,8 @@ If you followed the steps above to install the pre-commit hooks, then you can ju
166166

167167
When adding new azd environment variables, please remember to update:
168168

169+
1. [main.parameters.json](./main.parameters.json)
170+
1. [appEnvVariables in main.bicep](./main.bicep)
169171
1. App Service's [azure.yaml](./azure.yaml)
170172
1. [ADO pipeline](.azdo/pipelines/azure-dev.yml).
171173
1. [Github workflows](.github/workflows/azure-dev.yml)

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,9 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/e3490de2
9292
- Azure AI Document Intelligence: SO (Standard) tier using pre-built layout. Pricing per document page, sample documents have 261 pages total. [Pricing](https://azure.microsoft.com/pricing/details/form-recognizer/)
9393
- Azure AI Search: Basic tier, 1 replica, free level of semantic search. Pricing per hour. [Pricing](https://azure.microsoft.com/pricing/details/search/)
9494
- Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
95-
- Azure Cosmos DB: Serverless tier. Pricing per request unit and storage. [Pricing](https://azure.microsoft.com/pricing/details/cosmos-db/)
95+
- Azure Cosmos DB: Only provisioned if you enabled [chat history with Cosmos DB](docs/deploy_features.md#enabling-persistent-chat-history-with-azure-cosmos-db). Serverless tier. Pricing per request unit and storage. [Pricing](https://azure.microsoft.com/pricing/details/cosmos-db/)
96+
- Azure AI Vision: Only provisioned if you enabled [GPT-4 with vision](docs/gpt4v.md). Pricing per 1K transactions. [Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/computer-vision/)
97+
- Azure AI Content Understanding: Only provisioned if you enabled [media description](docs/deploy_features.md#enabling-media-description-with-azure-content-understanding). Pricing per TODO. [Pricing](TODO)
9698
- Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
9799

98100
To reduce costs, you can switch to free SKUs for various services, but those SKUs have limitations.

app/backend/prepdocs.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
from azure.core.credentials import AzureKeyCredential
88
from azure.core.credentials_async import AsyncTokenCredential
99
from azure.identity.aio import AzureDeveloperCliCredential, get_bearer_token_provider
10+
from rich.logging import RichHandler
1011

1112
from load_azd_env import load_azd_env
1213
from prepdocslib.blobmanager import BlobManager
@@ -161,7 +162,7 @@ def setup_file_processors(
161162
use_content_understanding: bool = False,
162163
content_understanding_endpoint: Union[str, None] = None,
163164
):
164-
sentence_text_splitter = SentenceTextSplitter(has_image_embeddings=search_images)
165+
sentence_text_splitter = SentenceTextSplitter()
165166

166167
doc_int_parser: Optional[DocumentAnalysisParser] = None
167168
# check if Azure Document Intelligence credentials are provided
@@ -245,8 +246,7 @@ async def main(strategy: Strategy, setup_index: bool = True):
245246

246247
if __name__ == "__main__":
247248
parser = argparse.ArgumentParser(
248-
description="Prepare documents by extracting content from PDFs, splitting content into sections, uploading to blob storage, and indexing in a search index.",
249-
epilog="Example: prepdocs.py '.\\data\*' -v",
249+
description="Prepare documents by extracting content from PDFs, splitting content into sections, uploading to blob storage, and indexing in a search index."
250250
)
251251
parser.add_argument("files", nargs="?", help="Files to be processed")
252252

@@ -299,7 +299,7 @@ async def main(strategy: Strategy, setup_index: bool = True):
299299
args = parser.parse_args()
300300

301301
if args.verbose:
302-
logging.basicConfig(format="%(message)s")
302+
logging.basicConfig(format="%(message)s", datefmt="[%X]", handlers=[RichHandler(rich_tracebacks=True)])
303303
# We only set the level to INFO for our logger,
304304
# to avoid seeing the noisy INFO level logs from the Azure SDKs
305305
logger.setLevel(logging.DEBUG)
@@ -310,7 +310,7 @@ async def main(strategy: Strategy, setup_index: bool = True):
310310
use_gptvision = os.getenv("USE_GPT4V", "").lower() == "true"
311311
use_acls = os.getenv("AZURE_ADLS_GEN2_STORAGE_ACCOUNT") is not None
312312
dont_use_vectors = os.getenv("USE_VECTORS", "").lower() == "false"
313-
use_content_understanding = os.getenv("USE_CONTENT_UNDERSTANDING", "").lower() == "true"
313+
use_content_understanding = os.getenv("USE_MEDIA_DESCRIBER_AZURE_CU", "").lower() == "true"
314314

315315
# Use the current user identity to connect to Azure services. See infra/main.bicep for role assignments.
316316
if tenant_id := os.getenv("AZURE_TENANT_ID"):

app/backend/prepdocslib/cu_image.py

Lines changed: 28 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import aiohttp
55
from azure.core.credentials_async import AsyncTokenCredential
66
from azure.identity.aio import get_bearer_token_provider
7+
from rich.progress import Progress
78
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed
89

910
logger = logging.getLogger("scripts")
@@ -44,7 +45,23 @@ def __init__(self, endpoint: str, credential: Union[AsyncTokenCredential, str]):
4445
self.endpoint = endpoint
4546
self.credential = credential
4647

48+
async def poll_api(self, session, poll_url, headers):
49+
50+
@retry(stop=stop_after_attempt(60), wait=wait_fixed(2), retry=retry_if_exception_type(ValueError))
51+
async def poll():
52+
async with session.get(poll_url, headers=headers) as response:
53+
response.raise_for_status()
54+
response_json = await response.json()
55+
if response_json["status"] == "Failed":
56+
raise Exception("Failed")
57+
if response_json["status"] == "Running":
58+
raise ValueError("Running")
59+
return response_json
60+
61+
return await poll()
62+
4763
async def create_analyzer(self):
64+
logger.info("Creating analyzer '%s'...", image_schema["analyzerId"])
4865

4966
token_provider = get_bearer_token_provider(self.credential, "https://cognitiveservices.azure.com/.default")
5067
token = await token_provider()
@@ -55,33 +72,21 @@ async def create_analyzer(self):
5572
async with aiohttp.ClientSession() as session:
5673
async with session.put(url=cu_endpoint, params=params, headers=headers, json=image_schema) as response:
5774
if response.status == 409:
58-
print(f"Analyzer '{analyzer_id}' already exists.")
75+
logger.info("Analyzer '%s' already exists.", analyzer_id)
5976
return
6077
elif response.status != 201:
6178
data = await response.text()
62-
# TODO: log it
63-
print(data)
79+
logger.error("Error creating analyzer: %s", data)
6480
response.raise_for_status()
6581
else:
6682
poll_url = response.headers.get("Operation-Location")
6783

68-
@retry(stop=stop_after_attempt(60), wait=wait_fixed(2))
69-
async def poll():
70-
async with session.get(poll_url, headers=headers) as response:
71-
response.raise_for_status()
72-
response_json = await response.json()
73-
if response_json["status"] != "Succeeded":
74-
raise ValueError("Retry")
75-
76-
await poll()
84+
with Progress() as progress:
85+
progress.add_task("Creating analyzer...", total=None, start=False)
86+
await self.poll_api(session, poll_url, headers)
7787

78-
def run_cu_image(self, analyzer_name, image):
79-
result = self.run_inference(analyzer_name, image)
80-
model_output = result["result"]["contents"][0]["fields"]
81-
model_output_raw = str(model_output)
82-
return model_output, model_output_raw
83-
84-
async def verbalize_figure(self, image_bytes) -> str:
88+
async def describe_image(self, image_bytes) -> str:
89+
logger.info("Sending image to Azure Content Understanding service...")
8590
async with aiohttp.ClientSession() as session:
8691
token = await self.credential.get_token("https://cognitiveservices.azure.com/.default")
8792
headers = {"Authorization": "Bearer " + token.token}
@@ -96,19 +101,9 @@ async def verbalize_figure(self, image_bytes) -> str:
96101
response.raise_for_status()
97102
poll_url = response.headers["Operation-Location"]
98103

99-
@retry(stop=stop_after_attempt(60), wait=wait_fixed(2), retry=retry_if_exception_type(ValueError))
100-
async def poll():
101-
async with session.get(poll_url, headers=headers) as response:
102-
response.raise_for_status()
103-
response_json = await response.json()
104-
print(response_json)
105-
# rich.print it all pretty progress-y
106-
if response_json["status"] == "Failed":
107-
raise Exception("Failed")
108-
if response_json["status"] == "Running":
109-
raise ValueError("Running")
110-
return response_json
111-
112-
results = await poll()
104+
with Progress() as progress:
105+
progress.add_task("Processing...", total=None, start=False)
106+
results = await self.poll_api(session, poll_url, headers)
107+
113108
fields = results["result"]["contents"][0]["fields"]
114109
return fields["DescriptionHTML"]["valueString"]

0 commit comments

Comments
 (0)