Skip to content

Commit 802194b

Browse files
committed
Merge branch 'main' of https://github.com/MicrosoftDocs/azure-docs-pr into bklink
2 parents 66e41fe + 7b79044 commit 802194b

File tree

653 files changed

+11091
-9058
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

653 files changed

+11091
-9058
lines changed

.openpublishing.redirection.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4805,6 +4805,11 @@
48054805
"redirect_url": "/azure/vpn-gateway/about-site-to-site-tunneling",
48064806
"redirect_document_id": false
48074807
},
4808+
{
4809+
"source_path_from_root": "/articles/vpn-gateway/openvpn-azure-ad-tenant-multi-app.md",
4810+
"redirect_url": "/azure/vpn-gateway/point-to-site-entra-users-access",
4811+
"redirect_document_id": false
4812+
},
48084813
{
48094814
"source_path_from_root": "/articles/vpn-gateway/vpn-gateway-howto-multi-site-to-site-resource-manager-portal.md",
48104815
"redirect_url": "/azure/vpn-gateway/add-remove-site-to-site-connections",

.openpublishing.redirection.sentinel.json

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,5 @@
11
{
22
"redirections": [
3-
{
4-
"source_path": "articles/sentinel/cef-name-mapping.md",
5-
"redirect_url": "/azure/sentinel/cef-syslog-ama-overview",
6-
"redirect_document_id": false
7-
},
83
{
94
"source_path": "articles/sentinel/detect-threats-built-in.md#use-analytics-rule-templates",
105
"redirect_url": "/azure/sentinel/create-analytics-rule-from-template",

CODEOWNERS

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,9 @@
99
/articles/dedicated-hsm @tynevi @thomps23
1010
/articles/key-vault @tynevi @thomps23
1111
/articles/payment-hsm @tynevi @thomps23
12+
/articles/postgresql @tynevi @thomps23
13+
/articles/cosmos-db @tynevi @thomps23
14+
/articles/dms @tynevi @thomps23
15+
/articles/mariadb @tynevi @thomps23
16+
/articles/mysql @tynevi @thomps23
17+
/articles/managed-instance-apache-cassandra @tynevi @thomps23

articles/ai-services/document-intelligence/concept-custom-generative.md

Lines changed: 54 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,87 +1,116 @@
11
---
2-
title: Azure AI Document Intelligence (formerly Form Recognizer) custom generative field extraction
2+
title: Azure AI Document Intelligence (formerly Form Recognizer) custom generative document field extraction
33
titleSuffix: Azure AI services
44
description: Custom generative AI model extracts user-specified fields from documents across a wide variety of visual templates.
55
author: laujan
66
manager: nitinme
77
ms.service: azure-ai-document-intelligence
8-
ms.custom:
9-
- ignite-2023
108
ms.topic: overview
11-
ms.date: 08/07/2024
9+
ms.date: 08/09/2024
1210
ms.author: lajanuar
1311
monikerRange: '>=doc-intel-4.0.0'
1412
---
1513

16-
# Document Intelligence custom generative model
14+
# Document Field extraction - custom generative AI model
1715

18-
[!INCLUDE [preview-version-notice](includes/preview-notice.md)]
16+
> [!IMPORTANT]
17+
>
18+
> * Document Intelligence public preview releases provide early access to features that are in active development. Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
19+
> * The public preview version of Document Intelligence client libraries default to REST API version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) and is currently only available in the following Azure regions.
20+
> * **East US**
21+
> * **West US2**
22+
> * **West Europe**
23+
> * **North Central US**
24+
>
25+
> * **The new custom generative model in AI Studio is only available in the North Central US region**:
1926
20-
The custom generative model combines the power of document understanding with Large Language Models (LLMs) and the rigor and schema from custom extraction capabilities. Custom generative extraction enables you to easily automate data extraction workflows for any type of document, with minimal labeling and greater accuracy and speed.
27+
The document field extraction (custom generative AI) model utilizes generative AI to extract user-specified fields from documents across a wide variety of visual templates. The custom generative AI model combines the power of document understanding with Large Language Models (LLMs) and the rigor and schema from custom extraction capabilities to create a model with high accuracy in minutes. With this generative model type, you can start with a single document and go through the schema addition and model creation process with minimal labeling. The custom generative model allows developers and enterprises to easily automate data extraction workflows with greater accuracy and speed for any type of document. The custom generative AI model excels in extracting simple fields from documents without labeled samples. However, providing a few labeled samples improves the extraction accuracy for complex fields and user-defined fields like tables. You can use the [REST API](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) or client libraries to submit a document for analysis with a model build and use the custom generative process.
2128

22-
## Custom generative model key features
29+
## Custom generative AI model benefits
2330

2431
* **Automatic labeling**. Utilize large language models (LLM) and extract user-specified fields for various document types and visual templates.
32+
2533
* **Improved Generalization**. Extract data from unstructured data and varying document templates with higher accuracy.
26-
* **Grounded results**. Localize the data extracted in documents and ensure the response is generated from the content and enables human review workflows.
27-
* **High confidence scores**. Use confidence scores and quickly filter high quality extracted data for downstream processing and lower manual review time.
34+
35+
* **Grounded results**. Localize the data extracted in the documents. Custom generative models ground the results where applicable, ensuring the response is generated from the content and enable human review workflows.
36+
37+
* **Confidence scores**. Use confidence scores for each extracted field to, filter high quality extracted data, maximize straight through processing of documents and minimize human review costs.
2838

2939
### Common use cases
3040

31-
* **Contract Lifecycle Management**. Build a generative model and extract the fields, clauses, and obligations from a wide array of contract types.  
32-
* **Loan & Mortgage Applications**. Automation of loan and mortgage application process enables banks, lenders, and government entities to quickly process loan and mortgage application.  
33-
* **Financial Services**. Analyze complex documents like financial reports and asset management reports.
34-
* **Expense management**. The custom generative model can extract expenses, receipts, and invoices with varying formats and templates.  
41+
* **Contract Lifecycle Management**. Build a generative model and extract the fields, clauses, and obligations from a wide array of contract types.
42+
43+
* **Loan & Mortgage Applications**. Automation of loan and mortgage application process enables banks, lenders, and government entities to quickly process loan and mortgage application.
44+
45+
* **Financial Services**. With the custom generative AI model, analyze complex documents like financial reports and asset management reports.
46+
47+
* **Expense management**. Receipts and invoices from various retailers and businesses need to be parsed to validate the expenses. The custom generative AI model can extract expenses across different formats and documents with varying templates.
48+
49+
### Managing the training dataset
50+
51+
With our other custom models, you need to maintain the dataset, add new samples, and train the model for accuracy improvements. With the custom generative AI model, the labeled documents are transformed, encrypted, and stored as part of the model. This process ensures that the model can continually use the labeled samples to improve the extraction quality. As with other custom models, models are stored in Microsoft storage, and you can delete them anytime.
52+
53+
The Document Intelligence service does manage your datasets, but your documents are stored encrypted and only used to improve the model results for your specific model. A service-manged key can be used to encrypt your data or it can be optionally encrypted with a customer managed key. The change in management and lifecycle of the dataset only applies to custom generative models.
3554

3655
## Model capabilities  
3756

38-
The Custom generative model currently supports dynamic table with the `2024-07-31-preview` and the following fields:
57+
Field extraction custom generative model currently supports dynamic table with the `2024-07-31-preview` and the following fields:
3958

4059
| Form fields | Selection marks | Tabular fields | Signature | Region labeling | Overlapping fields |
4160
|:--:|:--:|:--:|:--:|:--:|:--:|
4261
|Supported| Supported |Supported| Unsupported |Unsupported |Supported|
4362

4463
## Build mode  
4564

46-
The `build custom model` operation supports custom _template_, _neural_ and _generative_ models, _see_[Custom model build mode](concept-custom.md#build-mode):
65+
The `build custom model` operation supports custom **template**, **neural**, and **generative** models, _see_[Custom model build mode](concept-custom.md#build-mode). Here are the differences in the model types:
66+
67+
* **Custom generative AI models** can process complex documents with various formats, varied templates, and unstructured data.
68+
69+
* **Custom neural models** support complex document processing and also support more variance in pages for structured and semi-structured documents.
4770

48-
* **Custom generative models** can process complex documents in various formats, templates, and unstructured data.
49-
* **Custom neural models** support complex document processing and also support more variance in page for structured and semi-structured documents.
5071
* **Custom template models** rely on consistent visual templates, such as questionnaires or applications, to extract the labeled data.
5172

5273
## Languages and locale support
5374

54-
The custom generative model `2024-07-31-preview` version supports the **en-us** locale. For more information on language support, *see* [Language support - custom models](language-support-custom.md).
75+
Field extraction custom generative model `2024-07-31-preview` version supports the **en-us** locale. For more information on language support, _see_ [Language support - custom models](language-support-custom.md).
5576

5677
## Region support
5778

58-
The custom generative model `2024-07-31-preview` version is only available in `North Central US`.  
79+
Field extraction custom generative model `2024-07-31-preview` version is only available in `North Central US`.  
5980

60-
## Input requirements 
81+
## Input requirements
6182

6283
[!INCLUDE [input requirements](./includes/input-requirements.md)]
6384

6485
## Best practices  
6586

6687
* **Representative data**. Use representative documents that target actual data distribution, and train a high-quality custom generative model. For example, if the target document includes partially filled tabular fields, add training documents that consist of partially filled tables. Or if field is named date, values for this field should be a date as random strings can affect model performance.
88+
6789
* **Field naming**. Choose a precise field name that represents the field values. For example, for a field value containing the Transaction Date, consider naming the field _TransactionDate_ instead of `Date1`.
68-
* **Field Description**. Provide more contextual information in description to help clarify the field that needs to be extracted. Examples include location in the document, potential field labels it may be associated with, ways to differentiate with other terms that could be ambiguous.  
69-
* **Variation**. Custom generative models can generalize across different document templates of the same document type. As a best practice, create a single model for all variations of a document type. Ideally, include a visual template for each type, especially for ones that 
90+
91+
* **Field Description**. Provide more contextual information in description to help clarify the field that needs to be extracted. Examples include location in the document, potential field labels it can be associated with, and ways to differentiate with other terms that could be ambiguous.
92+
93+
* **Variation**. Custom generative models can generalize across different document templates of the same document type. As a best practice, create a single model for all variations of a document type. Ideally, include a visual template for each type, especially for ones that
7094

7195
## Service guidance
7296

7397
* The Custom Generative preview model doesn't currently support fixed table and signature extraction.
98+
7499
* Inference on the same document could yield slightly different results across calls and is a known limitation of current `GPT` models.
100+
75101
* Confidence scores for each field might vary. We recommend testing with your representative data to establish the confidence thresholds for your scenario.
76-
* Grounding, especially for tabular fields, is challenging and might not be perfect in some cases.  
102+
103+
* Grounding, especially for tabular fields, is challenging and might not be perfect in some cases.
104+
77105
* Latency for large documents is high and a known limitation in preview.
106+
78107
* Composed models don't support custom generative extraction.
79108

80109
## Training a model  
81110

82111
Custom generative models are available with the `2024-07-31-preview` version and later models.
83112

84-
The `build operation` to train model supports the ```buildMode``` property, to train a custom generative model, set the ```buildMode``` to ```generative```.
113+
The `build operation` to train model supports the `buildMode` property, to train a custom generative model, set the `buildMode` to `generative`.
85114

86115
```bash
87116

articles/ai-services/document-intelligence/includes/preview-notice.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ ms.date: 07/31/2024
1313

1414
> [!IMPORTANT]
1515
>
16-
> * Document Intelligence public preview releases provide early access to features that are in active development.
17-
> * Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
16+
> * Document Intelligence public preview releases provide early access to features that are in active development. Features, approaches, and processes may change, prior to General Availability (GA), based on user feedback.
1817
> * The public preview version of Document Intelligence client libraries default to REST API version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true).
1918
> * Public preview version [**2024-07-31-preview**](/rest/api/aiservices/operation-groups?view=rest-aiservices-2024-07-31-preview&preserve-view=true) is currently only available in the following Azure regions. Note that the custom generative (document field extraction) model in AI Studio is only available in North Central US region:
2019
> * **East US**

0 commit comments

Comments
 (0)