Skip to content

Commit 1f7de66

Browse files
Dybepamelafox
andauthored
replace form recognizer with doc intelligence (#686)
* Adding anchors * replace form recognizer with doc intelligence * Port to doc intelligence * Revert changes * Reword printed text --------- Co-authored-by: Pamela Fox <[email protected]> Co-authored-by: Pamela Fox <[email protected]>
1 parent ab1e603 commit 1f7de66

File tree

3 files changed

+9
-9
lines changed

3 files changed

+9
-9
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,12 +85,12 @@ However, you can try the [Azure pricing calculator](https://azure.com/e/8ffbe5b1
8585

8686
- Azure App Service: Basic Tier with 1 CPU core, 1.75 GB RAM. Pricing per hour. [Pricing](https://azure.microsoft.com/pricing/details/app-service/linux/)
8787
- Azure OpenAI: Standard tier, ChatGPT and Ada models. Pricing per 1K tokens used, and at least 1K tokens are used per question. [Pricing](https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/)
88-
- Form Recognizer: SO (Standard) tier using pre-built layout. Pricing per document page, sample documents have 261 pages total. [Pricing](https://azure.microsoft.com/pricing/details/form-recognizer/)
88+
- Azure AI Document Intelligence: SO (Standard) tier using pre-built layout. Pricing per document page, sample documents have 261 pages total. [Pricing](https://azure.microsoft.com/pricing/details/form-recognizer/)
8989
- Azure AI Search: Standard tier, 1 replica, free level of semantic search. Pricing per hour.[Pricing](https://azure.microsoft.com/pricing/details/search/)
9090
- Azure Blob Storage: Standard tier with ZRS (Zone-redundant storage). Pricing per storage and read operations. [Pricing](https://azure.microsoft.com/pricing/details/storage/blobs/)
9191
- Azure Monitor: Pay-as-you-go tier. Costs based on data ingested. [Pricing](https://azure.microsoft.com/pricing/details/monitor/)
9292

93-
To reduce costs, you can switch to free SKUs for Azure App Service and Form Recognizer by changing the parameters file under the `infra` folder. There are some limits to consider; for example, the free Form Recognizer resource only analyzes the first 2 pages of each document. You can also reduce costs associated with the Form Recognizer by reducing the number of documents in the `data` folder, or by removing the postprovision hook in `azure.yaml` that runs the `prepdocs.py` script.
93+
To reduce costs, you can switch to free SKUs for Azure App Service and Azure AI Document Intelligence by changing the parameters file under the `infra` folder. There are some limits to consider; for example, the free Azure AI Document Intelligence resource only analyzes the first 2 pages of each document. You can also reduce costs associated with the Azure AI Document Intelligence by reducing the number of documents in the `data` folder, or by removing the postprovision hook in `azure.yaml` that runs the `prepdocs.py` script.
9494

9595
⚠️ To avoid unnecessary costs, remember to take down your app if it's no longer in use,
9696
either by deleting the resource group in the Portal or running `azd down`.
@@ -203,7 +203,7 @@ You can also customize the search service (new or existing) for non-English sear
203203

204204
#### Other existing Azure resources
205205

206-
You can also use existing Form Recognizer and Storage Accounts. See `./infra/main.parameters.json` for list of environment variables to pass to `azd env set` to configure those existing resources.
206+
You can also use existing Azure AI Document Intelligence and Storage Accounts. See `./infra/main.parameters.json` for list of environment variables to pass to `azd env set` to configure those existing resources.
207207

208208
#### Provision remaining resources
209209

scripts/prepdocs.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,10 @@ def setup_file_strategy(credential: AsyncTokenCredential, args: Any) -> FileStra
4040
if args.localpdfparser:
4141
pdf_parser = LocalPdfParser()
4242
else:
43-
# check if Azure Form Recognizer credentials are provided
43+
# check if Azure Document Intelligence credentials are provided
4444
if args.formrecognizerservice is None:
4545
print(
46-
"Error: Azure Form Recognizer service is not provided. Please provide formrecognizerservice or use --localpdfparser for local pypdf parser."
46+
"Error: Azure Document Intelligence service is not provided. Please provide --formrecognizerservice or use --localpdfparser for local pypdf parser."
4747
)
4848
exit(1)
4949
formrecognizer_creds: Union[AsyncTokenCredential, AzureKeyCredential] = (
@@ -228,17 +228,17 @@ async def main(strategy: Strategy, credential: AsyncTokenCredential, args: Any):
228228
parser.add_argument(
229229
"--localpdfparser",
230230
action="store_true",
231-
help="Use PyPdf local PDF parser (supports only digital PDFs) instead of Azure Form Recognizer service to extract text, tables and layout from the documents",
231+
help="Use PyPdf local PDF parser (supports only digital PDFs) instead of Azure Document Intelligence service to extract text, tables and layout from the documents",
232232
)
233233
parser.add_argument(
234234
"--formrecognizerservice",
235235
required=False,
236-
help="Optional. Name of the Azure Form Recognizer service which will be used to extract text, tables and layout from the documents (must exist already)",
236+
help="Optional. Name of the Azure Document Intelligence service which will be used to extract text, tables and layout from the documents (must exist already)",
237237
)
238238
parser.add_argument(
239239
"--formrecognizerkey",
240240
required=False,
241-
help="Optional. Use this Azure Form Recognizer account key instead of the current user identity to login (use az login to set current user for Azure)",
241+
help="Optional. Use this Azure Document Intelligence account key instead of the current user identity to login (use az login to set current user for Azure)",
242242
)
243243

244244
parser.add_argument("--verbose", "-v", action="store_true", help="Verbose output")

scripts/prepdocslib/pdfparser.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ def __init__(
7373

7474
async def parse(self, content: IO) -> AsyncGenerator[Page, None]:
7575
if self.verbose:
76-
print(f"Extracting text from '{content.name}' using Azure Form Recognizer")
76+
print(f"Extracting text from '{content.name}' using Azure Document Intelligence")
7777

7878
async with DocumentAnalysisClient(
7979
endpoint=self.endpoint, credential=self.credential, headers={"x-ms-useragent": USER_AGENT}

0 commit comments

Comments
 (0)