Azure AI Document Intelligence Code Samples

Note

Form Recognizer is now Azure AI Document Intelligence!

Code samples for each language's SDK are in the links below. The first step is to click to choose one（default Python）.

Python	.NET	Java	JavaScript

The contents of this repo default latest version：v4.0(Preview) . You can click v3.1(GA) to view earlier versions.

Features

Azure AI Document Intelligence is a cloud-based Azure AI service that enables you to build intelligent document processing solutions. Massive amounts of data, spanning a wide variety of data types, are stored in forms and documents. Document Intelligence enables you to effectively manage the velocity at which data is collected and processed and is key to improved operations, informed data-driven decisions, and enlightened innovation.

Prerequisites

Python 3.8 or later is required to use this package
You must have an Azure subscription and an Azure Document Intelligence account to run these samples.
All of these samples need the endpoint to your Document Intelligence resource (instructions on how to get endpoint), and your Document Intelligence API key (instructions on how to get key).

Setup

Install the Azure Document Intelligence client library for Python with pip:

pip install azure-ai-documentintelligence --pre

Clone or download this sample repository
Open the sample folder in Visual Studio Code or your IDE of choice.

Running the samples

Open a terminal window and cd to the directory that the samples are saved in.
Set the environment variables specified in the sample file you wish to run.
If necessary, click Example Document to get your document URL.
Below are some sample code guidelines so that you can choose the sample according to your needs.

Common samples

Read model

File Name	Usage scenarios
sample_analyze_read.py and sample_analyze_read_async.py	Read document elements, such as pages and detected languages

Layout model

File Name	Usage scenarios
sample_analyze_layout.py and sample_analyze_layout_async.py	Extract text, selection marks, and table structures in a document

Prebuilt model

File Name	Usage scenarios
sample_analyze_invoices.py and sample_analyze_invoices_async.py	Analyze document text, selection marks, tables, and pre-trained fields and values pertaining to English invoices using a prebuilt model
sample_analyze_business_cards.py and sample_analyze_business_cards_async.py	Analyze document text and pre-trained fields and values pertaining to English business cards using a prebuilt model
sample_analyze_identity_documents.py and sample_analyze_identity_documents_async.py	Analyze document text and pre-trained fields and values pertaining to US driver licenses and international passports using a prebuilt model
sample_analyze_receipts.py and sample_analyze_receipts_async.py	Analyze document text and pre-trained fields and values pertaining to English sales receipts using a prebuilt model
sample_analyze_tax_us_w2.py and sample_analyze_tax_us_w2_async.py	Analyze document text and pre-trained fields and values pertaining to US tax W-2 forms using a prebuilt model

Add-on capabilities

File Name	Usage scenarios
sample_analyze_addon_barcodes.pyand sample_analyze_addon_barcodes async.py	Extract barcode
sample_analyze_addon_fonts.py and sample_analyze_addon_fonts_async.py	Extract font property
sample_analyze_addon_formulas.py and sample_analyze_addon_formulas_async.py	Extract formula
sample_analyze_addon_highres.py and sample_analyze_addon_highres_async.py	Extract high resolution
sample_analyze_addon_languages.py and sample_analyze_addon_languages_async.py	Detact language
sample_analyze_addon_query_fields.py and sample_analyze_addon_query_fields_async.py	Query fields

Custom model

File Name	Usage scenarios
sample_custom template.pyand sample_custom template async.py	Extract data from static layouts.
sample_custom neural.py and sample_custom neural_async.py	Extract data from mixed-type documents.
sample_custom composed .py and sample_custom composed _async.py	Extract data using a collection of models.
sample_custom classifier.py and sample_custom classifier_async.py	Identify designated document types (classes) before invoking an extraction model.

Click the link of the model name to reach the corresponding topic page for more details.

Click v3.1(GA) to view earlier versions.

Retrieval Augmented Generation (RAG) samples

The Layout model provides various building blocks like tables, paragraphs, section headings, etc. that can enable different semantic chunking strategies of the document. With semantic chunking in Retrieval Augmented Generation (RAG), it will be more efficient in storage and retrieval, together with the benefits of improved relevance and enhanced interpretability. The following samples show how to use the Layout model to do semantic chunking and use the chunks to do RAG.

File Name	Usage scenarios
sample_rag_langchain.ipynb	Sample RAG notebook using Azure AI Document Intelligence as document loader, MarkdownHeaderSplitter and Azure AI Search as retriever in Langchain

Only available for v4.0(Preview) .

Pre/post processing samples

There are usually some pre/post processing steps that are needed to get the best results from the Document Intelligence models. These steps are not part of the Document Intelligence service, but are common steps that are needed to get the best results. The following samples show how to do these steps.

File Name	Usage scenarios
sample_disambiguate_similar_characters.ipynb and sample_disambiguate_similar_characters.py	Sample postprocessing script to disambiguate similar characters based on business rules.
sample_identify_cross_page_tables.ipynb and sample_identify_cross_page_tables.py	Sample postprocessing script to identify cross-page tables based on business rules.

Applies to all versions.

Next steps

Check out the API reference documentation to learn more about what you can do with the Azure Document Intelligence client library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure AI Document Intelligence Code Samples

Table of Contents

Features

Prerequisites

Setup

Running the samples

Common samples

Read model

Layout model

Prebuilt model

Add-on capabilities

Custom model

Retrieval Augmented Generation (RAG) samples

Pre/post processing samples

Next steps

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.NET		.NET
Document samples		Document samples
Java		Java
JavaScript		JavaScript
Python		Python
README.md		README.md

851996006/sample-

Folders and files

Latest commit

History

Repository files navigation

Azure AI Document Intelligence Code Samples

Table of Contents

Features

Prerequisites

Setup

Running the samples

Common samples

Read model

Layout model

Prebuilt model

Add-on capabilities

Custom model

Retrieval Augmented Generation (RAG) samples

Pre/post processing samples

Next steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages